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PREFACE 


An empirical process that assigns possibly different non-random 
(random) weights to different observations is called a weighted (randomly 
weighted) empirical process. ‘These processes are as basic to linear regression 
and autoregression models as the ordinary empirical process is to one sample 
models. However their usefulness in studying linear regression and 
autoregression models has not been fully exploited. This monograph 
addresses this question to a large extent. 

There is a vast literature in Nonparametric Inference that discusses 
inferential procedures based on empirical processes in k-sample location 
models. However, their analogs in autoregression and linear regression 
models are not readily accessible. This monograph makes an attempt to fill 
this void. The statistical methodologies studied here extend to these models 
many of the known results in k-sample location models, thereby giving a 
unified theory. 

By viewing linear regression models via certain weighted empirical 
processes one is naturally led to new and interesting inferential procedures. 
Examples include minimum distance estimators of regression parameters and 
goodness-of-fit tests pertaining to the errors in linear models. Similarly, by 
viewing autoregression models via certain randomly weighted empirical 
processes one is naturally led to classes of minimum distance estimators of 
autoregression parameters and goodness-of-fit tests pertaining to the error 
distribution. 

The introductory Chapter 1 gives an overview of the usefulness of 
weighted and randomly weighted empirical processes in linear models. 
Chapter 2 gives general sufficient conditions for the weak convergence of 
suitably standardized versions of these processes to continuous Gaussian 
processes. This chapter also contains the proof of the asymptotic uniform 
linearity of weighted empirical processes based on the residuals when errors 
are heteroscedastic and independent. Chapter 3 discusses the asymptotic 
uniform linearity of linear rank and signed rank statistics when errors are 
heteroscedastic and independent. It also includes some results about the 
weak convergence of weighted empirical processes of ranks and signed ranks. 
Chapter 4 is devoted to the study of the asymptotic behavior of M- and R- 
estimators of regression parameters under heteroscedastic and independent 
errors, via weighted empirical processes. A brief discussion about bootstrap 
approximations to the distribution of a class of M-estimators appears in 
Section 4.2b. This chapter also contains a proof of the consistency of a class 
of robust estimators for certain scale parameters under heteroscedastic errors. 

In carrying out the analysis of variance of linear regression models 
based on ranks, one often needs an estimator of the functional [fdy(F), where 
F is the error distribution function, fits density and y is a function from 
[0, 1] to the real line. Some estimators of this functional and the proofs of 
their consistency in the linear regression setting appear in Section 4.5. 


Chapters 5 and 6 deal with minimum distance estimation, via 
weighted empirical processes, of the regression parameters and tests of 
goodness-of-fit pertaining to the error distribution. One of the main themes 
emerging from these two chapters is that the inferential procedures based on 
weighted empiricals with weights proportional to the design matrix provide 
the right extensions of k-sample location model procedures to linear 
regression models. 

It is customary to expect that a method that works for linear 
regression models should have an analogue that will also work in 
autoregression models. Indeed many of the inferential procedures based on 
weighted empirical processes in linear regression that are discussed in 
Chapters 3-6 have precise analogs in autoregression based on certain 
randomly weighted empirical processes and appear in Chapter 7. In 
particular, the proof of the asymptotic uniform linearity of the ordinary 
empirical process of the residuals in autoregression appears here. 

l asymptotic uniform linearity results in the monograph are shown 
to be consequences of the asymptotic continuity of certain basic weighted 
and randomly weighted empirical processes. 

Chapters 2-4 are interdependent. Chapter 5 is mostly self-contained 
and can be read after reading the Introduction. Chapter 6 uses results from 
Chapters 2 and 5. Chapter 7 is almost self-contained. The basic result 
needed for this chapter appears in Section 2.2b. 

The first version of this monograph was prepared while I was visiting 
the Department of Statistics, Poona University, India, on sabbatical leave 
from Michigan State University, during the academic year 1982-83. Several 
lectures on some parts of this monograph were given at the Indian Statistical 
Institute, New Delhi, and Universities of La Trobe, Australia, and 
Wisconsin, Madison. I wish to thank Professors S. R. Adke, Richard 
Johnson, S. K. Mitra, M. S. Prasad and B. L. S. Prakasa Rao for having 
some discussions pertaining to the monograph. My special thanks go to 
James Hannan for encouraging me to finish the project and for proof reading 
parts of the manuscript, to Soumendra Lahiri for helping me with sections on 
bootstrapping, and to Bob Serfling for taking keen interest in the monograph 
and for many comments that helped to improve the initial draft. 
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NOTATION AND CONVENTIONS 


The p—dimension Euclidean space is denoted by RP, p > 1; R= Ri. @ := the 


o—algebra of Borel sets in RP, = gi.  := Lebesgue measure on (R, @). 
The symbol ":=" stands for "by definition". 


For any set ACR, D(A) denotes the class of real valued functions on 
A that are right continuous and have left limits while DZ(A) denotes the 
subclass in D(A) whose members are nondecreasing. (€[0, 1] := the class of 
real valued bounded continuous functions on [0, 1]. 


A vector or a matrix will be designated by a bold letter. A teéR? is 
, P 
a px1 vector, t’ or tits transpose, ||t||” :-= ¥ tj, |t| := max{|tj], 1<j<p}. 
For any p-square matrix C, ||C||_ = sup Hit’ C]]; tl] < 1}. For an nxp 
matrix D, dai denotes its ith row, 1 <i<n,and D, the nxp matrix D—D, 
whose ith row consists of (dnij—dn)’, with dp := ¥j dni/n, 1 <i <n. 


w.e.p.(’s) = weighted empirical process(es). 

r.w.e.p.(’s) = randomly weighted empirical process(es). 

1.i.d. = independent identically distributed. 

T.v.(’s := random variable(s). 

d.f.(’s := distribution function(s). 

w.I.t. := with respect to. 

C-S := the Cauchy—Schwarz inequality. 

D.C.T. := the Dominated Convergence Theorem 

Fubini := the Fubini Theorem. 

L-F CLT := the roan Fiat Central Limit Theorem. 

0(1)(op(1)) = a sequence of numbers (r.v.’s) converging to zero (in 
probability). 

O(1)(0,(1)) = a sequence of numbers (r.v.’s) that is bounded (in 
probability). 

N(0, C) := either a r.v. with normal distribution whose mean 
vector is 0 and the covariance matrix C or 
the corresponding distribution. 

llsll,, = the supremum norm over the domain of g, g a real 
valued function. 

n 
r = 2 a2, for an arbitrary real vector (an, ..., Ann)’. 


Often in a discussion or in a proof the subscript n on the triangular 
arrays and various other quantities will not be exhibited. The index i in 
%; or & and max; or max will vary from 1 to n, unless specified 


1 1 
otherwise. All limits, unless specified otherwise, are taken as n — o. 


For a sequence of r.v..s {X, Xn, n > 1}, X, —> X means that the 


distribution of X, converges weakly to that of X. For twor.v.’s X, Y, 
X a Y means that the distribution of X is the same as that of Y. 


For a sequence of stochastic processes {Y, Yn, n > 1}, Yn 9 Y 
means that Yn converges weakly to Y in a given topology. Y,y = Y 


means that all finite dimensional distributions of Y, converge weakly to 
that of Y. 


Reference to an expression or a display is made by the (expression 
number) if referring in the same section and by the (chapter number.section 
number.expression number), otherwise. For example, by (3.2.1) is meant an 
expression (1) of Section 2 of Chapter 3. A reference to this while in Section 
3.2 would appear as (1). 

For convenient reference we list here some of the most often used 
conditions in the manuscript. For an arbitrary df. F on R, conditions (F1), 
(F2) and (F3) are as follows: 


(F1) F has uniformly continuous density f w.r.t. X. 
(F2) {>0, ae. A. 
(F3) sup cp |xf(x)| < o. 


These conditions are introduced for the first time just before Corollary 3.2.1 
and are used frequently subsequently. 

For an nxp design matrix matrix X, the conditions (NX), (NX1) 
and (NX,) are as follows: 


(NX) (XX) exists, n>p; max; xni(X X) xpi = 0(1). 


(NX1) (XcXc) exists, n > p; 


max; Xni (XeXc) Eni = 0(1). 


(NX¢) (X¢Xc)- exists, n > p; 

max; (Xni — Xn) (XcXc) (ni — Xn) = 0(1). 
The condition (NX) is the most often used from Theorem 2.3.3. onwards. 
The letter N in these conditions stands for Noether, who was the first 


person to use (NX), in the case p=1, to obtain the asymptotic normality of 
weighted sums of r.v.’s; see Noether (1949). 
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CHAPTER 1 
INTRODUCTION 


1.1. WEIGHTED EMPIRICAL PROCESSES 


A weighted empirical process (w.e.p.) corresponding to the random variables 
(r.v.’s) Xni, ..... Xnn and the non—random real weights dy, ...., dnn is 
defined to be 


nh 
Ua(x) := DF dni [(Xni < x), x€R, n> 1. 


The weights {dni} need not be nonnegative. 

The classical example of a w.e.p. is the ordinary empirical process 
that corresponds to dyj =n. Another example is given by the two sample 
empirical process obtained as follows: Let m _ be an integer, 1 < m < n, 
T:=n-m; dyi=-t/n, 1<i<¢m; dai = m/n, m+1<¢i<n. Then the 
corresponding Ug-process becomes 


Ua(x) = (mr/n) {r! y UXni <x)- m* 3 I(Xni < x)}, x ER, 
1=m+t 1= 


precisely the process that arises in two-sample models. 
More generally, weighted empirical processes (w.e.p.’s) arise naturally 


in linear regression models where, for each n > 1 andeach fe R?, the data 


{(xni, Yni), 1<i<n} are related to the error variables {eni, 1<i<n} by the 
linear relation 


(1) Yni=XxniP+eni, 1¢i1¢n. 
Here nj, -..-, Cnn are independent r.v.’s with respective continuous d.f.’s 
Fai, ----) Fan, Xni = (Xnit, ----) Xnip) is the ith row of the known nxp design 


matrix X and # is the parameter vector of interest. 
Consider the vector of w.e.p.’s V := (Vi, ...., Vp)’ where 


n , 
(2) Vi(y, t) = 3 xnij (Yui y + nit), yeR, teR’, 1<j<p. 


Clearly, V;(-, t) is an example of the w.e.p. Ua(-) with dni = Xnij 
and Xni= Yni- Xnit, 1<i¢n, 1<¢j<p. 


Observe that the data {(xni, Yni), 1<i<n} in the model (1) are 
readily summarized by the vector of w.e.p.’s {V(y, 0), yeR} in the sense 
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that the given data can be recovered from the sample paths of this vector up 
to a permutation. This in turn suffices for the purpose of inference about 
in (1). In this sense the vector of w.e.p.’s fV(y, 0), yeR} is at least as 
important to linear regression models (1) as is the ordinary empirical process 
to one-sample location models. One of the purposes of this monograph is to 
discuss the role of V-processes in inference and in proving limit theorems in 
models (1) in a unified fashion. 


1.2. M-, R- AND SCALE ESTIMATORS 


Many inferential procedures involving (1.1.1) can be viewed as functions of 
V. For example the least squares estimator, or more generally, the class of 
M-estimators corresponding to the score function %, (Huber: 1981), is 
defined as a solution t of the equation 


f Wy) Vidy, t) = a known constant. 


Similarly, rank (R) estimators of # corresponding to the score function 
are defined to be a solution t of the equation 


(1) f (Hn(y, t))V(dy, t) = a known constant, 
— n r 
H,(y, t) :=n ' » I(Yni < y + xnit), yeR, te RP. 


A significant portion of Nonparametric Inference in models (1.1.1) 
deals with M- and R- estimators of B (Adichie; 1967. Huber; 1973) and 


linear rank tests of hypotheses about ff, (Hajek—Sidak; 1967). By viewing 


these procedures as functions of {V(y, t), yeR, teR?}, it is possible to give a 
unified treatment of their asymptotic distribution theory, as is done in 
Chapters 3 and 4 below. 

There is a vast literature in Nonparametric Inference that discusses 
inferential procedures based on functionals of empirical processes in the 
k-sample location model such as the books by Puri and Sen (1969), Serfling 
(1980) and Huber (1981). Yet their appropriate extensions to the linear 
regression model are not readily accessible. This monograph seeks to fill this 
void. The methodology and inference procedures studied here extend many 
known results in the k-sample location model to the model (1.1.1), thereby 
giving a unified treatment. 

An important result needed for study of the asymptotic behavior of 
R-estimators of f is the asymptotic uniform linearity of the linear rank 


statistics of (1) in the regression parameter vector. Juretkova (1969, 1971) 
obtained this result under (1.1.1) with ii.d. errors. A similar result was 
proved in Koul (1969, 1971) and Van Eeden (1972) for linear signed rank 
statistics under i.i.d. symmetric errors. Its extension to the case of 
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nonidentically distributed errors is not readily available. Theorems 3.2.4 and 
3.3.3 prove the asymptotic uniform linearity of linear rank and linear signed 
rank statistics with bounded scores under the general independent errors 
model (1.1.1). In the case of i.i.d. errors, the conditions in these theorems on 
the error d.f. are more general than requiring finite Fisher information. The 
results are proved uniformly over all bounded score functions and are 
consequences solely of the asymptotic sample continuity of V-processes and 
some smoothness of {Fni}. The uniformity with respect to the score 
functions is useful when constructing adaptive rank tests that are 
asymptotically efficient against Pitman alternatives for a large class of error 
distributions. 

Chapter 3 also contains a proof of the asymptotic normality of linear 
rank and linear signed rank statistics under independent alternatives and for 
indicator score functions. This proof proceeds via the weak convergence of 


certain basic w.e.p.’s and complements some of the results in Dupa¢ and 
Hajek (1969). 

Section 4.2a discusses the asymptotic distribution of M-estimators 
under heteroscedastic errors using the asymptotic continuity of V-processes. 
Section 4.2b presents some second order results on bootstrap approximations 
to the distributions of a class of M-estimators. 

In order to make M-estimators scale invariant one often needs an 
appropriate robust scale estimator. One such scale estimator, as 
recommended by Huber (1981) and others, is 


s,= med {|Yni—xnif], 1<i <n}, 


where f is an estimator of f& The asymptotic distribution of s; under 
heteroscedastic errors is given in Section 4.3. In the case of i.i.d. errors, this 


asymptotic distribution does not depend on ff provided the errors are 
symmetric around 0. This observation naturally leads one to construct a 
scale estimator based on the symmetrized residuals, thereby giving another 
scale estimator 


So := med {lYni—xniB— Yuit xniAl; 1<i, j< n}. 


As expected, the asymptotic distribution of s2 is shown to be free from the 


estimator # in the case of i.i.d. errors, not necessarily symmetric. It also 
appears in Section 4.3. 

Section 4.4 discusses the asymptotic distribution of a class of 
R-estimators under heteroscedastic errors using the asymptotic uniform 
linearity results of Chapter 3. The R-estimators considered are 
asymptotically equivalent to Jaeckel’s estimators. 

The complete rank analysis of the linear regression model (1.1.1) 
requires an estimate of the scale parameter 


Q(f) := ff dy(F) 
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where f is density of the unknown common error df. F and g isa 
nondecreasing function on (0,1). This estimate is used to standardize the 
test statistic and estimate the standard error of the R-estimator 
corresponding to the score function y. This parameter also appears in the 
efficiency comparisons of rank procedures and it is of interest to estimate it, 
after the fact, in an analysis. 

_ Lehmann 3 Sen (1966), Koul (1971), among others, provide 
estimators of Q(f) in the one- and two- sample location models and in the 
linear regression model. These estimators are given in terms of the lengths 
or Lebesgue measures of certain confidence intervals or regions. They are 
usually not easy to compute when the dimension p of @ is larger than 1. 

In Section 4.5, estimators of Q(f), based on kernel type density 
estimators of f and the empirical df. Hy, are defined and their consistency 
under (1.1.1) with iid. errors is proved. An estimator whose window width 
is based on the data and is of the order of square root n, is also considered. 
The consistency proof presented is a sole consequence of the asymptotic 
continuity of certain w.e.p.’s and some smoothness of the error d.f.’s. 


1.3. ee DISTANCE ESTIMATORS AND GOODNESS—0F—FIT 
TEST 


The practice of obtaining estimators of parameters by minimizing a certain 
distance between some functions of observations and parameters has been 
present in statistics since its beginning. The classical examples of this 
method are the Least Square and the minimum Chi Square estimators. 

The minimum distance estimation (m.d.e.) method, where one obtains 
an estimator of a parameter by minimizing some distance between the 
empirical d. f. and the modeled d. f., was elevated to a general method of 
estimation by Wolfowitz (1953, 1954, 1957). In these papers he 
demonstrated that, compared to the maximum likelihood estimation method, 
the m.d.e. method yielded consistent estimators rather cheaply in several 
problems of varied levels of difficulty. 

This methodology saw increasing research activity from the mid 
1970’s when many authors demonstrated various robustness properties of 
certain m.d. estimators. See, e.g., Beran (1977, 1978), Parr and Schucany 
(1979), Millar (1981, 1982, 1984), Donoho and Liu (1988 a, b), among others. 
All of these authors restrict their attention to the one sample setup or to the 
two sample location model. See Parr (1981) for additional bibliography on 
m.d.e. till 1980. 

Inspite of many advances made in the m.d.e. methodology in one 
sample models, little was known till early 1980’s as to how to extend this 
methodology to one of the most applied models, v.i.z., the multiple linear 
regression model (1.1.1). A significant advantage of viewing the model 
(1.1.1) through V is that one is naturally led to interesting m.d. estimators 
of ff that are natural extensions of their one- and two- sample location 


model counterparts. To illustrate this, consider the m.d. estimator 6 of the 
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one sample location parameter @, when errors are i.i.d. symmetric around 0, 
defined by the relation 


@ := argmin {T,,(t); teR}, 
with 
e* n 
Ta(t) = fr? BYni Sy +t) Yas <y—t)]V"aG(y), te R, 
where GeDZ(R). Since (1.1.1) is an extension of the one sample location 


model, it is only natural to seek an extension of @ in this model. Assuming 
that {eni} are symmetrically distributed around 0, the first thing one is 


tempted to consider as an extension of @ is ff; defined by the relation 


A; := argmin {K{(t); teR?}, 
with 


Ki(t) = f fn“? ¥ (Yai < y+xnit) —I(-Yni < yxnit)]}?dG(y), teR?. 


However, any extension of @ to the linear regression model should 


have the property that it reduce to @ when the model is reduced to the one 
sample location model and, in addition, that it reduce to an appropriate 


extension of § to the k-sample location model when the model (1.1.1) is 
reduced to it. In this sense J; does not provide the right extension but py. 
does, where 


(1) By = argmin {Ky (t); teR?}, 
with 
Kx(t) = fv" (y, t) (XX) V'(y, t) dG), te R?, 
V" := (Vi, ....,V3), 


n 
Vi(y, t) = Vily, t)— B xnij + Vi(-y, t), 1¢)¢p, YER, teR’. 


In the case errors are not symmetric but i.i.d. according to a known 
d.f. F, so that EVj(y, 8) = 2i xnij F(y), a suitable class of m.d. estimators of 
B is defined by the relation 


): : ; p 
Bf, := argmin {K,(t); teR’}, 
with 
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(2) K(t):= f\lW(y, t)I? aG(y), te R?, 
Wy, t) = (XX) /7{v(y, t)-X 1F(y)}, yeR, teR’, 
1 ey) aa: 


Chapter 5 discusses the existence, the asymptotic distribution, the 
robustness and the asymptotic optimality of py and B, under (1.1.1) with 
heteroscedastic errors. For example, if p = 1 in (1.1.1) and the design 
variable is nonnegative then the asymptotic variance of A is smaller than 


that of fi for a large class of symmetric error d.f.’s F and integrating 
measures G. A similar result holds about By and for p > 1. Chapter 5 also 
discusses several other m.d. estimators of # and their asymptotic theory 
under (1.1.1) with heteroscedastic errors. These include analogues of B, 


when the common error d.f. is unknown and some m.d. estimators 
corresponding to certain supremum distances based on V. 

Closely related to the problem of minimum distance estimation is the 
problem of testing the goodness-of-fit hypothesis Ho: Fni = Fo, Fo a known 
d.f.. One test statistic for this problem is 


D, := supy|n’/{Ha(y, B) — Fo(y)} I, 


where # is an estimator of f. This test statistic is suggested by looking at 
the estimated residuals and mimicking the one sample location model 
technique. In general, its large sample distribution depends on the design 
matrix. In addition, it does not reduce to the Kiefer (1959) tests of 
goodness-of-fit in the k-sample location problem when (1.1.1) is reduced to 
this model. Test statistics that overcome these deficiencies are 


Da := supy |W%y, A)|, Ds := supy |[W°y, A)I], 
where W° is equal to the W of (2) with F = Fo. Another natural class of 
tests is based on K9(,), where K} is equals tothe K, of (2) with W 


replaced by W>® in there. 
All of the above and several other goodness-of-fit tests are discussed 
at some length in Chapter 6. Section 6.2a discusses the asymptotic null 


distributions of the supremum distance statistics Dj, j = 1, 2, 3. Also 
discussed in this section are asymptotically distribution free analogues of 
these tests, in a sense similar to that discussed by Durbin (1973, 1976) and 
Rao (1972) for the one-sample location model. Section 6.2b discusses 
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smooth bootstrap approximations to the null distributions of tests based on 
W.eé.p.’s. 

Tests based on L»-distances are discussed in Section 6.3. Some 
modifications of goodness-of-fit tests when Fo has a scale parameter appear 
in Section 6.4 while tests of the symmetry of the errors are discussed in 
Section 6.5. 


1.4. RANDOMLY WEIGHTED EMPIRICAL PROCESSES 


A randomly weighted empirical process (r.w.e.p.) corresponding to the 


random variables (r.v.’s) Cn, ..--, Cnn, the random noise 64, ...., dnn and 
the random real weights hy, ...., bnn is defined to be 

_4n 
(1) Vi(x) := 2 “2, boi (ni < x + ni), xE€R, n> 1. 


Examples of r.w.e.p.’s are provided by the w.e.p.’s {Vj;1< j< p} of 
(1.1.2) in the case the design variables are random. More importantly, 
T.w.e.p.’s arise naturally in autoregression models. To illustrate this, let Yo 
= (Xo, ...., Xj)’ be an observable random vector, {€;, i > 1} be iid. r.v.’s, 
independent of Yo, and p’ = (1, ..... Pp) be a p-dimensional parameter 
vector. In the pth order autoregression (AR(p)) model one observes {X;} 
obeying the relation 


(2) Xi = pi Xiat...+ pp Xip + Gi, i>1, peR. 


Processes that play a fundamental role in the robust estimation of p 
in this model are randomly weighted residual empirical processes 
T = (T,, eeeey Tp)’; where 


_71n / 
(3) T;(x, t):=2 > e(Xi-j) (Xi<xtt Yi), xER, te, 


Yia= (X4-1, ..... Xi-p), 121, and where g is a measurable function from R 
to R. Clearly, for each 1< j< p, Tj(x, pen t/ *t) is an example of V}(x) 
with Cni= ei, Onizn /?t Yi4 and boi = g(Xi5). 


It is customary to expect that a method that works for linear 
regression models should have an analogue that will also work in 
autoregression models. Indeed the above inferential procedures based on 
w.e.p.’s in linear regression have perfect analogues in AR(p) models in 
terms of T. The generalized M-estimators of p as proposed by Denby and 
Martin (1979) corresponding to the weight function g and the score 
functions y~ are given as a solution t of the p equations 
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f Wx) T(dx, t) = 0, 


assuming that Eye) = 0. Clearly, the classical least square estimator is 
obtained upon taking g(x) =x = yx) in these equations. 


A generalized R-estimator p, corresponding to a score function 
is defined by the relation 


(4) j, = argmin {||S(t)||; t € RP})]], 
where 


S(t) := f o(Fa(x, #)) T(dx, t), 


_7 7 , 
F,(x, t) =n ‘2, U(Xi<x+t Yi), xéER, te R?. 


(4) analogue of an R-estimator of (1.2.1) is obtained by taking g(x) =x in 
4). 


The m.d. estimators p, that are analogues of fy of (1.3.1) are 


defined as minimizers, w.r.t. téR?, of 
ae 1/23 : f 2 
K(t) := Xe f{n & Xi jtl(Xi < x+t Yy-,)-I(-Xi < x-t Yi-1)}]“ dG(x). 
j= = 


Observe that K involves T corresponding to g(x) = x. 
Chapter 7 discusses these and some other procedures in detail. 


Section 7.2 contains a result that says that the r.w.e.p.’s {T(x, pn if 4), 


x€R, ||t||<B} and the residual empirical processes 1Pa(x, p+n ty t), xeR, 
|t||<B} are asymptotically uniformly linear in t, for every 0 < B < o. 
These results are used to investigate the asymptotic behavior of G-M and R- 
estimators in Sections 7.3a and 7.3b respectively. In order to carry out the 
rank analysis in AR(p) models, one needs a consistent estimator of Q(f) 
where now f is the error density of {e;}. A class of such estimators is 
given in Section 7.3c. A large class of m.d. estimators and their asymptotics 
appears in Section 7.4 whereas Section 7.5 briefly discusses some tests of 
goodness-of-fit hypotheses pertaining to the error d.f.. 

The contents of Chapter 2 are basic to those of Chapters 3, 4, and 
parts of Chapters 6 and 7. Sections 2.2a and 2.2b contain, respectively, 
proofs of the weak convergence of suitably standardized w.e.p.’s and 
I.w.e.p.’s to continuous Gaussian processes. Even though w.e.p.’s are a 
special case of r.w.e.p.’s, it is beneficial to investigate their weak convergence 
separately. For example, the weak convergence of Ug is obtained under a 
fairly general independent setup and minimal conditions on {dni} whereas 
that of Vn is obtained under some hierarical dependence structure on {mni, 
hni, dni} and the boundedness of the weights {hy;}. 
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In Section 2.3, the asymptotic continuity of certain standardized 
w.e.p.’s is used to prove the asymptotic uniform linearity of V(., t) in t, for t 
in certain shrinking neighborhoods of f, under fairly general heteroscedastic 
errors. This result is found useful in Chapter 4 when discussing 
M-estimators and in Chapter 6 when discussing supremum distance test 
statistics for goodness-of-fit hypotheses. The asymptotic continuity is also 
found useful in Chapter 3 to prove various results about rank and signed 
rank statistics under heteroscedastic errors. The asymptotic continuity of 
Vn-processes is found useful in Chapter 7 when discussing the AR(p) model. 

Chapter 2 concludes with results on functional and bounded laws of 
iterated logarithm pertaining to certain w.e.p.’s. It also includes an 
inequality due to Marcus and Zinn (1984) that gives an exponential bound on 
the tail probabilities of w.e.p.’s of independent r.v.’s. This inequality is an 
extension of the well celebrated Dvoretzky, Kiefer and Wolfowitz (1956) 
inequality for the ordinary empirical process. A result about the weak 
convergence of w.e.p.’s when r.v.’s are p-dimensional is also stated. These 
results are included for completeness, without proofs. They are not used in 
the subsequent sections. A martingale property of a properly centered Ug 
process is proved in Section 2.4. 


CHAPTER 2 


ASYMPTOTIC PROPERTIES OF 
WEIGHTED EMPIRICALS 


2.1. INTRODUCTION. 


Let, for each n>1, 1, .---, Nun be independent r.v.’s taking values in 0, 1] 


with respective d.f.’s Gnpj, ....,.Gnn and dy4, ...., dnn be real numbers. 
Define 

n 
(1) W(t) =» daitl(tni <t)—Gni(t)}, O<¢<t<¢<1. 


Observe that both V;, of (1.4.1) and Wg belong to D[0, 1] for each n and 
for any triangular arrays {hni,1<i<n}and {dyi, 1 <i <n}. 

In this chapter we first prove certain weak convergence results about 
suitably standardized Wa and V}, processes. This is done in Sections 2.2a 
and 2.2b, respectively. Sections 2.3.1 uses the asymptotic continuity of a 
certain Waq-process to obtain the asymptotic uniform linearity result about 
V(-, u) of (1.1.2) in u. Analogous result for T(-, u) of (1.4.3) uses the 
asymptotic continuity of a certain Vy-process and is proved in Section 7.2. 

A proof of an exponential inequality for a stopped martingale with 
bounded differences due to Johnson, Schechtman and Zinn (1985) and 
Levental (1989) is included in Section 2.2b. This inequality is of general 
interest and an important tool needed to carry out a chaining argument 
pertaining to the weak convergence of Vp. 

Section 2.4 treats laws of iterated logarithm pertaining to Wg, the 


weak convergence of Wg when {ni} are in [0, 1]?, the weak convergence of 
Wa w.r.t. some other metrics when {7;} arein [0, 1], an embedding result 
for Wa when {ni} are iid. uniform (0, 1] r.v.’s, and a proof of its 
martingale property. It also includes an exponential inequality for the tail 
probabilities of w.e.p.’s of independent r.v.’s. This inequality is an extension 
of the well celebrated Dvoretzky, Kiefer and Wolfowitz (1956) inequality for 
the ordinary empirical process. These results are stated for the sake of 
completeness, without proofs. They are not used in the subsequent sections. 


2.2. WEAK CONVERGENCE 
2.2a. W4, — Processes. 


In this section we give two proofs of the weak convergence of suitably 
standardized {Wa} toalimitin (€[0, 1]. Accordingly, let 
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n 
(1) Ga(t) = 3 d2; Gui(t), 0<t<1, 
and 
n 
(2) Ca(s, t) = 3 d2; [Gni(sAt) — Gai(s) Gui(t)), 0<s,t<1. 


Let a denote the supremum metric. 


Theorem 2.2a.1. Let {mi}, {dni} and {Gyi} be as in Section 2.1. 
In addition assume that the following hold: 


(N1) 1 ioe d2;=1, forall n>1. 
1: 
(C) lim, , lim supn SUP yc 4<1-6 [Ga(t + 6) — Ga(t)] = 0. 


Then, for every «> 0, 


(i) lim, ) lim supn * > | Walt) — Wa(s)| > €) = 0. 
t-s| < 


(ii) Moreover, Wa 2 some W on (D(0, 1], 2) if and only if for every 
0<s,t<1, Ca(s,t) converges to some covariance function C(s, t). 


In this case W_ is necessarily a continuous Gaussian process with 
covariance function C and W(0) =0 = W(1). 


Remark 2.2a.1. Perhaps a remark about the labeling of the conditions 
is in order. The letter N in (N1) and (N2) stands for Noether who was the 
first person to use these conditions to obtain the asymptotic normality of 
certain weighted sums of r.v.’s. See Noether (1949). 

The letter C in the condition (C) stands for the specified continuity 
of the sequence {Gq}. Observe that the df.’s {Gj} need not be continuous 
for each i and n; only {Gg} needs to be equicontinuous in the sense of 
(C). Of course if {mi} arei.id. G then, because of (N1), (C) is equivalent 
to the continuity of G. o 


The proof of the theorem will follow from the following two lemmas. 
Lemma 2.2a.1. Forany 0<¢s<t<u<¢l1 andeach n>1 
E|Wa(t) — Wa(s)|"|Wa(u) — Wa(t)|” 
(3a) ¢ 3 [Ga(u)—Ga(t)][Ga(t)— Ga(s)]. 
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(3b) ¢ 3[Ga(u) — Gals). 
Proof. Fix 0<s,t,u<1 and let 
pi = Gi(t) — G,(s), qi = G;(u) — Gi(t), 
a=IK(s<mict)—pi, A=It<m¢u)—qi, 1¢<i¢n. 


Observe that Eaj=0=E f; forall 1<i, j<n, {ai} are independent as 
are {6;} and that a; is independent of 6; for i#j. Moreover, 


Wa(t) — Wa(s) = Sidiay, Wa(u) — Wa(t) = dy dsfi. 
Now expand and multiply the quadratics and use the above facts to obtain 


(4) EB] Wa(t) — Wa(s)|?|Wa(u) — Wa(t) | 


= di Bats? + 2,3 d? d? Eas EG? +2 5,5 d? d? E(a;(;) E( 0, ;). 


But 
Eo§ = p;(1—pi), Ef; = qj(1 — qj); 
Eaifi = (1—p:)” piad + (1 — ai) api + piai(1 — ai — pi) 
¢ {(1 — pi) + (1 — ai) + (1 — ai — pi)} Digi 
¢ $pidi; 
E(aif;) = — (1 — pi) pigi — (1 — ai) aipi + pia (1 — gi — pi) 
=—Pidi , I<i¢n, 1¢j¢n. 
Therefore, 
(5) LHS (4) < 3{2; di pigi + 5, dd? piaj} = 3[¥; dpi] [3; dj qj). 


This completes the proof of (3a), in view of the definition of {pi, qj}. That 
of (3b) follows from (3a), (1) and the monotonicity of the Gi, 1<i<¢n. ao 


Lemma 2.2a.2. For every ¢€>0 and s<¢u, 
(6) P[ sup, 4c, |Wa(t) — Wa(s)| > 
¢« € * [Ga(u) — Ga(s)]” + P[| Wa(u) — Wa(s)| > €/2| 
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where k does not depend on €,n oron any underlying quantity. 
Proof. Let 6=u-—s,m > 1 be an integer, 
(7) é; = Wa((j/m)6+ s)—Wa(((j-1)/m)6+s), 145 $m, 


k 
Sk = 2 Si , Ma = MAX 4 <q | Sx. 


The right continuity of Wq implies that for each n and each sample path, 
Mn — sup{|Wa(t) — Wa(s)|; s<t<u} asm—o, w.p.1. In view of Lemma 
2.2a.1, Lemma A.1 in the Appendix is applicable to the above r.v.’s {&} 
with y= 2, a=1 and 

uj = 3'/7{Ga((j/m)6 + 8) —Ga(((j—-1)/m)S+s)}, 1<¢j¢m. 
Hence (6) follows from that lemma and the right continuity of Wa. o 

Proof of Theorem 2.2a.1. Fora 6> 0, let r= (6+), the greatest 
integer less than or equal to 1/6 Define tj = j6,1<¢j<r and to=0. Let 
Tj = Wa(tj) — Wa(tj-1), 1<j¢r. Then 


P(suP 1 4_,| <5 | Wals) — Wa(s)] 2 €) 
<3 Plsup,, cace, |Wals) — Walti-)] 2 €/3] 
¢ ne" YD [Ga(t;) —Ga(tj)? + SPIITsI 2 €/6] 
CKe sup [Ga(t + 6) —Ga(t)] + ¥ PLIT\| 2 €/6 
0<t<1-6 j=l 


(8) = I,(6) + In(4), (say). 
In the above the first inequality follows from Lemma A.2 of the Appendix, 


the second inequality follows from Lemma 2.2a.2 above and the last 
inequality follows because, by (N1), 


(9) 2, [Ga(t) — Galt] § Ga(1) = 1. 
Next, observe that 
(10) of := Var(Ij) = Si d3{Gi(tj) — Gi(tj-1)} {1 — Gi(tj) + Gi(tj-)}, 
¢ Ga(tj) — Ga(tj-1), 1< jr, 
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and, by (9), that 
(11) B oj £ SUP egy ;[Ga(t + 6)—Ga(t)], all r and all n. 
Furthermore, (N1) and (N2) enable one to apply the Lindeberg—Feller 


Central Limit Theorem (L—F CLT) to conclude that oj Tj — Z, Za N(0, 1) 
tr.v. Therefore, for every §>0 (or r <o) 


(12) \In(5) - 2 P(|Z| > (€/6)a;')| 40 as no. 


By the Markov Inequality applied to the summands in the second term of 
(12) and by (11), 


(13) lim supp In(6) < 3 lim supp by (60;/)* (Ez! = 3) 
J = 
ke lim supn SUP nce <1-5 [Ga(t + 6) — Ga(t)]. 
The result (i) now follows from (13), (8) and the assumption (C). 


Proof of (ii). Suppose Cg —C. Let m_ be a positive integer, 
0< ty, ..., tm <1 and a4, ..., am be arbitrary but fixed numbers. Consider 


(14) T,:= 5 a; Wa(t;) = 3. dy Vj 
where = _ 
Vi: = 3 a;{I(ni < tj) — Gi(t;)}, 1<i¢an. 
Note that 
(15) rs] <3 laj| <a, 1<i<n. 


Also, Var(Tn) — g° = by » aj ar C(tj, tr). In view of (N1) and (N2), the 


L—F CLT yields that Ta N(0, 1). Hence all finite dimensional 


distributions of Wg converge weakly to those of a Gaussian process W 
with the covariance function C and W ) =0= Lk In view of (i), this 
implies that Wa 2 W in (D(0, 1], with W denoting a continuous 
Gaussian process tied down at 0 and 1. 


Conversely, suppose Wg 3 W. By (i), W is in (C[0, 1]. In 


particular the T, of (14) converges in distribution to T := Ry aj W(t)). 
Moreover, (15) and (N1) imply that, for all n > 1, 
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BT = E(Ss dids)* = 3s dt EY + 32,9 di dj VY BDF < 3( 3 a;|)*, 


Therefore {Te n > 1} is uniformly integrable and hence 
y) m m m m 
ET a 1 aj ak Cq(tj, tk) Por aj ax Cov[W(tj), W(tx)] 


for any set of numbers 0 < {tj} <1 and any finite real numbers aj, ..., an. 
Hence 


Ca(s, t) —+ Cov[W(s), W(t)] = C(s, t) for all 0<s,t< 1. 


Now repeat the above argument of the "only if" part to conclude that W 
must be a tied down Gaussian process in €[0, 1]. o 


Another set of sufficient conditions for the weak convergence of {Wa} 
is given in the following 


Theorem 2.24.2. Under the notation of Theorem 2.2a.1, suppose that 
(N1) holds. In addition, assume that the following hold: 


(B) nmMaX,¢5¢, d2; = O(1). 
and 

n i Gni(t) —t is nonincreasing int, 0<t<1, n> 1. 
D 15; Gai(t) —t 0 1 1 


Then also (i) and (ii) of Theorem 2.2a.1 hold. 
Remark 2.2a.1. Clearly (B) implies (N2). Moreover 


[Ga(t + 6) — Ga(t)] < n max; dj [m ‘3; {Gi(t + 6) — Gi(t)}] 
=n max; di [n “D; {G,(t + 6) -—(t + 8} 
-n°E {Gi(t) -t} + 4 

<nmax;d?6, O0<t<¢1-—6,_ by (D). 
Thus (B) and (D) together imply (N2) and (C). Hence Theorem 2.2a.2 
follows from Theorem 2.2a.1. However, we can also give a different proof of 
Theorem 2.2a.2 which is direct and quite interesting (see (19) below). This 
proof will be based on the following three lemmas. 


Lemma 2.2a.3. Under (D), for all n> 1, 
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(16) E]Wa(t) — Wa(s)|* < k3 {3(t—s)? + (t—s)n “}, O<s,t<1 


where k3 = n max dc: 


1<i<n 
Proof. Suppose 0<s<t<i1. Let a; and p; be as in the proof of 
Lemma 2.2a.1. Using the independence of {a;} and the fact that Ea; = 0 
for all 1 <i<n, one obtains 
E| Wa(t) — Wa(s)|* = E(dd; a;)* 
= Y; dj Eai + 33,5 d? d? Ea? Ea? 
if; j j 
= ¥; dj {Eaj —3 E2(a%)} + 3(0; d? Ea?) 
= ¥; di pi(1 — ps) (1 — 6pi(1 — pi) + 
+3 [Sidi pi (1—pi)l? 
2 7.2 = 2 
(17) ¢kg {n “Di pit 3(n “Y; pi)*}. 
But s<t and (D) imply 
0< n ‘¥; pi= n “3; [G3(t) — Gi(s)] < (t —s). 


Hence, 
Lh.s. (16) < k3 {n 1(t —s ) +3(t —s)"}, O<s<tcl. 


The proof is completed by interchanging the role of s and t in the above 
argument in the case t <s. oO 


Next, define, for (i-1)/n<t<i/n, 1<i<¢n, 
(18)  Za(t) = Wa((i-1)/n) + {nt - (i-1)} [Wa(i/n - Wa(i-1)/n]. 
Lemma 2.2a.4. The assumption (D) implies that 


(19) E|Za(t) — Za(s)|* < k3 144|t-—s ]7, 0<5,t¢1, n21. 


If, in addition, (N1) and (B) hold, then, 
(20) supt |Wa(t) — Za(t)| = op(1). 


Proof. Let n>1 and 0<s, t <1 be arbitrary but fixed. Choose 
integers 1 <i, j<n such that 
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(21) (i-1)/n <8 ¢i/n and (j-1)/n¢t¢ j/n. 
For the sake of convenience, let 
bcm := | Za(m/n) — Za(k/n)| = |Wa(m/n) — Wa(k/n)|, 
4k4 I(m-k)/n]°, m, k integers; 


| Za(u) — Za(v)|, 0<u,v<1l. 


Dkom : 


Ausv: 


From (16), 


(22) Eétym < ko {3(m-k)?/n? + n ?- |m-k|} < 43 [(m-k)/n]” = by, 


The proof of (19) will be completed by considering the following three 
cases. 


Case 1. i < j-1. Then because of (18) and (21), 
Asst < max{ 6;, j-1, 63,3, bi-1,j-1, 65-1,;} 


which entails that 


(23) 10 NGA < E{ 63, ;-1 + 6; + 654551 + 6-153} 
€ bi,j-1 + bi,j + Di-t,j-1 + Di-t,j (by (21)) 
£4 bi-a,j = 16 kG [(j-(i-1))/a)” 
where the last inequality follows from 0 < j-i-1 < j-i < j-(i-1). 
Note that (21), i < j-1 and i, j integers imply that 
(24) 3(t-s) 2 [j-(i-1)]/n. 
From (23) and (24) one obtains 
(25) EAS, < 144 k4 (t - 8)’. 
Case 2. i= j. In this case (i-1)/n<s,t<i/n. From (18) one has 
As,,=nlt-s| 64-1; 


so that from (22) 
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(26) EAs, < n‘(t - s)*-4k4-n 7 < 4k3 (t - 5)’. 
The last inequality follows because n(t - s) < 1. 
Case 3. i = j-l. By the triangle inequality 
As,t < 2 max(Ags,im, Ait ). 


Thus by Case 2, applied once with s and i/n and once with i/n and t, 
one obtains 


E eee < 2NEAS in zB EAY at } 


(27) ¢ 2° k? {(i/n - s)? + ( t -i/n)*} < 2” 3 (t - 8)’. 
In view of (27), (26) and (25), the proof of (19) is complete. 

To prove (20), let dy, = max(0, dj), dj- = max(0, -d;). Then one has 
dj = dj,—di-. Decompose Wg and Zg accordingly. Note that max(d? ‘ 
d?.) = az, + d?. = d?, 1<i<n. This and (N1) imply that 7g,<1, 7a_< 1. 


It also implies that if (N2) is satisfied by the {d;} then it is also satisfied by 
{di,, di-}. By the triangle inequality, 


(28) [Wa — Zall, < [Was — Zasll, + [Wa-—- Za]. 


Moreover djsAdj- > 0, for all i. Therefore, it is enough to prove (20) for 
dj>0,1<i<n. Accordingly suppose that is the case. Then 


(29) [Wa — Zal]_ < U1 + Ye, 
where 
(30) Uu,= max sup |Wa(t) — Wa((i - 1)/n)], 


1<i<n (i-1)/n<t<i/n 


y= max sup __| Wat) — Wa(i/a)]. 
1<i<n (i-1)/n<t<i/n 


For (i-1)/n<t<i/n, and dj>0,1<i<n, 
| Wa(t) — Wa(i/n)| < |2j dj I(t < nj <i/n)| + ¥j dj [Gj(i/n) — Gj(t)] 


¢ | Wa(i/n) — Wa((i-1)/n)| + 
+ 2 3; dj [Gj(i/n) — Gj((i-1)/n)| 


(31) ¢ 6i-1,1 + 2 maxj dj, by (D). 


2.28 WEAK CONVERGENCE 19 


Wq—Processes 


Therefore, by (22), (30), (31) and the Markov inequality, for every 
€> 0 and for n sufficiently large such that 2 max; dj < ¢, the existence of 
which is guarenteed by (B), 

P(U> > €) < P(maxy 6j-4,5 > € —2 max; d;) 
—4 n 
< (€ - 2 max; dj) x E63-153 
1: 
(32) < (€- 2 max; dy) * -4k3n 10. 
Exactly similar calculations show that WZ; = 0)(1). o 

Proof of Theorem 2.2a.2. Observe that Zq(0) = 0 = Zgq(1) and that 
Za € €[0, 1] for every n> 1 and each sequence {dj}. Hence by (19) and 
Theorem A.2 of the Appendix, {Za} is tight in ([0, 1]. Thus claim (i) 
follows from (20). To prove (ii) just argue as in the proof of (ii) of Theorem 
2.2a.1 above. O 

The following corollary will be useful later on. To state it we need 


some more notation. Let Fyy, ..... Fun be d.f.’s on R and Xp; bear.v. 
with d.f. Foi, 1<i<n. Define 


(33)  H(x) := nd; Fni(x), x €R; H ‘(t) := inf{x; H(x) >t}, 0<t<1; 
Lai(t) = Fai(H {(t)), 1¢ign; Lat) := 34 d24 Lni(t), 
Wa(t) := Yi dni{I(Xni < H 4(t)) — Lai(t)}, O<t <1. 


Corollary 2.2a.1. Assume that 


(34) Xn, ..., Xnn are independent r.v.’s with respective d.f.’s Fni, ..., Fan 
on R. 


In addition, suppose that {dni}, {Fni} satisfy (N1), (N2) and 
(C*) lim, , lim supn sup (La(t + 6) — Lg(t)] = 0. 
0<t<1-6 
Then, for every «> 0, 
. % x 
(35) lim, , lim supn P( SUPT ts} <é |Wa(t) — Wa(s)| > €) = 0. 


Proof. Follows from Theorem 2.2a.1(i) applied to 7; = H(Xi), 
G,;2L;,1¢1<¢n. O 


Remark 2.2a.3. Note that if H is continuous then n? Yy Lyi(t) = t. 
Therefore, 
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(36) sup [La(t + 6) —La(t)] < n max; 2; 6. 
0<t<1-6 


Thus, if we strengthen ee to require (B) then (C*) is a prior: satisfied. 
That is, the conditions ot Theorem 2.2a.2(i) are satisfied. 


If Fni=F, F acontinuous df. then Ly;(t) =t. Therefore, in view 


of (N1), (C*) is a priori satisfied. Moreover Ca(s, t) = Cov(Wa(s), Wa(t)) 
=s(1—t), 0<¢s<¢t<1. Therefore we obtain 


Corollary 2.2a.2. Suppose that Xp, ...,Xnn are iid. F,F a 


continuous d.f.. Suppose that {dni} satisfy (N1) and(N2). Then Wa > B 
in (D[0, 1], 2) with B a Brownian bridge in C\0, 1]. O 


Observe that dy; zn 2/? satisfy (N1) and (N2). In other words the 
above corollary includes the well celebrated result, v.iz., the weak 
convergence of the sequence of the ordinary empirical processes. 


Note. A variant of Theorem 2.2a.1 was first proved in Koul (1970). The above 
formulation and proof is based on this work and that of Withers (1975). Theorem 2.2a.2 
is motivated by the work of Shorack (1973) which deals only with the weak convergence 


of the Wy-process, the process Wq with dpj = ae ss The sufficiency of condition (D) 
for (16) was observed by Eyster (1977). O 


2.2b. V},-processes 


In this subsection we shall investigate the weak convergence of the r.w.e.p.’s 
{Vu(x), x ER} of (1.4.1). To state the general result we need some more 
structure on the underlying r.v.’s. 
Accordingly, let (022, A, P) bea probability space and G bead _-f. on 
R. For each integer n> 1, let (Cni, bni, dni), 1 <i< n, be an array of 
trivariate r.v.’s defined on (, A) such that {Gni,1<i<n} are iid. G 
r.v.’s and ¢,;i is independent of (hni, 6ni) foreach 1<i<n. Furthermore, 
let {Ani} be an array of sub o—fields such that Ani C Ansist, Ani C Ansi,i, 1 
€i¢n, n> 1; (Qos, 6x1) is Ani-measurable; the r.v.’s {(nj,...,¢n5j-1; (ni, 
< j} oi A,j-measurable, 2<¢ j<n; and (nj is independent of 
n. Define 


ni); < 


1 
Anj; 1<j< 
(1) Va(x):= a” Y hai I(Cni < x+6ni),  Va(x) = n* Y hai I(Cni < x), 
1= 1= 
Jy(x) 2= a ¥ Elbo I( Cai § x+6n3)} | Ana] = n'y hn; G(x+6ni), 
1= 1= 


* | n 
Jn(x) := 2 2 hni G(x), 
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Un(x) := n’/?[Va(x) — Ju(x)], Un(x) = n/?[Va(x) — Ju(x)], x eR. 
We are now ready to state the following 


Theorem 2.2b.1. In addition to the above, assume that the following 
conditions hold: 


(Al) SUP», Maxi [hnil <c, as., for some constant c < o. 
(A2) max; | 6ni| = 0)(1). 


(A383)  — n /? ys hag Sai] = Op(1). 


(A4) G has a uniformly continuous density g w.r.t. A, and g > 0, a.e. 
Then 

(2) | Un — Vall = op(1). 

Tf, in addition, 

(A5) n'Y; {hil 2 _. of in probability, a ar.v., 

then F 

(3) Un, » a BG), Un, a-B(G) 


where B is a Brownian Bridge in €[0, 1], independent of a. 

The proof of (2) uses a restricted chaining argument and an 
exponential inequality for martingales with bounded differences. It will be a 
consequence of the following two lemmas. 


Lemma 2.2b.1. Under (A1)—(A4), Ve>0 and for r=1, 2, 
: -1/2 2 r rey. 
limy P( sup n & [hail |G(y + bai) — G(x + &i)| < 2c €) = 1, 
where the supremum is taken over the set {x, y € R; nl A G(x) — G(y)| < e}. 
Proof. Let ¢ > 0, q(u) := g(G “(u)), O<u<l; m:= max; | 64;|, 


Wy = sup{|q(u) - q(v)]; |u-v| < en 1/2} 
= sup{|g(x) - e(y)|; |G(x) - G(y) |< en /7}, 
An := sup {|g(x) - g(y)|]; ly - x] < to}- 
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By (A4), q is uniformly continuous on [0, 1]. Hence, by (A2), 
(4) An =0)(1), wa = 0(1). 
But 
sup n 1/7 Y |hil® |G(y + 6) ~ G(x + 6)| 
<sup a /?3 |hal* |G(y) — G(x)| + 
+n? ¥ Ihsl"|6il [wn + 24 
<c € + O,(1) - 0p(1), by (A3) and (4). 
This completes the proof of the Lemma. Oo 
Lemma 2.2b.2. Let {Fi, i>0} be an increasing sequence of o—fields, 
m be a positive integer, T< m be a stopping time relative to {Fi} and {&:, 


1<i<m} bea sequence of real valued martingale differences w.r.t. {Fi}. In 
addition, suppose that 


(i) [i] <M <o, for some constantM <o, 1<i<m, 
and 

7 
(ii) Xe E(£3|Fi-1) <L, foraconstant L <o. 


Then, for every a> 0, 
T 
(5) P(| x €;| > a) < 2 exp{{a/2M) arcsinh(Ma/2L)}. 
l= 
Proof. Write of = E(€4|Fi-1), i> 1. First, consider the 


case 7 => m: 


Recall the following elementary facts: For all x € R, 


(6a) exp(x) —x —1 < 2(cosh x — 1) < x sinh x, 
(6b) (sinh x)/x is increasing in |x|, 
(6c) x < exp(x — 1). 


Because E(£;|¥i-1) = 0 and by (i), fora 6>0 and forall 1<i<m, 
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E{[exp(6¢i) — 1] |Fi-s} ¢ E{ 6g; sinh (5€;)|Fi-}, by (6a) 


(7) < of 6 sinh(SM)/M, by (6b). 


Use a conditioning argument to obtain 
Eexp{5 3 ci} 
= Blexp(6 5, &) Efexp(6 0) |Fa-} 
< Blexp(6,3, €1) exp(E{exp(6 fa) |Fa-s} —1)), by (6) 
¢ Blexp(6 "3. &) exp( o2- 6/M-sinh(6M))] by (7) 
< Elexp(é = €;) exp{(L 5 a4) -6/M-sinh(6M)}}. 
Observe that L ~3 a’ is #;-2 measurable, for all j > 2. Hence, iterating 
the above argument m—1 times will give 
Eexp{53. £:} < exp{L -/M-sinh(6M)}. 
Now, by the Markov inequality, V a > 0, 
P(E €;>a)< Bexp{a( 3 €;—a)} < exp{é [L/M-sinh(6M) — al}. 
The choice of 6 = (1/M) arcsinh(Ma/2L) in this leads to the inequality 
P(E £; > a) < exp{(-a/2M) arcsinh(Ma/2L)}. 


An application of this inequality to {-&;} will yield the same bound for 


P( 3 €; < -a), thereby completing the proof of (5) in the case =m. Now 
1= 
consider the 
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general case 7 < m: 


Let xj = &jI(j < 7). Because the event [j< 7] € Fj-1, it follows that 
{xj, Fj} satisfy the conditions of the previous case. Hence, 


P(|E él 2a) =P(|_¥ xi] 2a) ¢ exp{(-a/2M) arcsinh(Ma/2L)}. 0 


Proof of Theorem 2.2b.1. For the clarity of the proof it is important 
to emphasize the dependence of various underlying processes on on. 
Accordingly, we shall write Vy, Un etc. for Vy, Up etc. in the proof. 


On R define the metric d{(x,y) := |G(x) — G(y)|‘/*.. This metric 
makes R totally bounded. Thus, to prove the theorem, it suffices to prove 


(a) Vv yeR, | Un(y) — Un(y)| = op(1), 
(b) We>04 6>0 3 
(i)  limsupnP( sup |Un(y) — Un(x)| > €) < «, 


X,;y)% 


(ii) limsuppP( sup |Un(y) — Un(x)| > 6) <e. 


d(x,y)<6é 


Proof of (a). The fact that U,- U., is a sum of conditionally centered 
bounded r.v.’s yields that 


x = 
Var (Un(y)-Ua(y)) ¢ Em * Bi hi |G(y + 6) — G(y)| = 0(1), 
by (A1), (A2), (A4) and the Dominated Convergence Theorem. 
Proof of (b)(i). The following proof of (b)(i) uses a restricted chaining 


argument as discussed in Pollard (1984: p. 160—162), and the exponential 
inequality of Lemma 2.2b.2 above. 


Fix an €>0. Let ap := [ni/ of e], the greatest integer less than or 
equal to nl a é, and define the grid 


Jh:= {yy G(yj)= ja, 1<j¢an}, 221. 
Also let 

Zi(x):= 1(¢, < x + 6) -G(x + 6), xER, 1<i¢n. 
Write hy =hy,—hj-, hy, = max(0, hy), so that 
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U,(x) = a? Yh Zi(x) — a? Yh. Zi(x) = Us(x) — Ua(x), say. 
1= 12: 


Thus to prove (b)(i), by the triangle inequality, it suffices to prove it for Us, 
processes. The details of the proof shall be given for the Uj process only; 
those for the U, being similar. 


Next, we need to define the sequence of stopping times 


k 
% (his) E{[Zi(x)-Z(y) 14} 
Ta = nA max {k>1; max 9 4+—______________ < 3ec 


xX, YEN d’ (x,y) ° 


Observe that 7, <n. To adapt the present situation to that of the Pollard, 
we first prove that P(r, < n)—+0 (see (8) below). This allows one to work 
with n// 2(rs)l/ : Urs instead of Uj. By Lemma 2.2b.2 and the fact that 
arcsinh(x) is increasing and concave in x, one obtains that if x,y in %&% 
are such that d’(x, y)>ten /? then 


p(n /?(r8)/?| Ut s (x) — Ute (y)| 2 t) 
+2 


2cd (x,y) 


< 2 exp {— - € arcsinh(1 /(6¢e"c))}, for all t > 0. 


This enables one to carry out the chaining argument as in Pollard. What 
remains to be done is to connect between the points in R anda pointin % 
which will be done in (9) below. We shall now prove 
(8) P(ra <n) — 0. 

Proof of (8). For yj, yxin M with yj < yx, @ (Yj, yk) > (k-j)en 1 o 
Hence, using the fact that (h;.)*< h? 

y 2 (v:)|7| As} / d2(v: 
& (his” E{[Zi(yx) — Zilyj)! Aid / @ (yisyx) 
n es 
¢ ¥ by [Gly + 6) — G(yj + 6] {(k-sey a”? 


_ n k-1 
C{(k-je} in? Bhi Y [Glyra + 6) — G(yr + &)] 
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< € “1 ni/2 max y hi (G(yrat + §;) — G(yr + 6;)]. 
1<r<an ist 


Now apply Lemma 2.2b.1 with r=2 to obtain 
P( max, [3 (his)” EX[Zi(yx) - Zi(yi)]" 14} /@(yjsyu)] < Sec? n) 1. 


This completes the proof of (8). 


1< a oF an i 


Next, for each xeR, let Yi, denote the point in % that is the 
closest to x in d-metric from the points in % that satisfy yj, <x We 
shall now prove: V «> 0, 


(9) P(supx | Un(x) — Un(yj,)| > 8ce) — 0. 


Proof of (9). Now write Vi, J; for Vn, Jn when {h;} in these 
quantities is replaced by {hi,}. 
The definition of yj ,G increasing, and the fact that hi, < [hy| for 


all i, imply that 
1/2: 7 + 
sup, |n / [Ja(x) — Ja(y;,)]| 


-1/22 
‘ rey ean a 1h IG(y; ae 6) 7 G0; : *)). 


An application of Lemma 2.2b.1 with r= 1 now yields that 
(10) P(sup |n™/*[Ja(x) — Ja(yi,)]| > 4c€) 0. 


But hi,> 0, 1<1i< n, implies that Vy is nondecreasing in x. Therefore, 
using the definition of yj, 


= 2 + + + + 
a /Tua(y;,-1) -UR(yi,)] + JR(yi,-1) -Je(yi,) = 
= Valyi,-1) - Valyj,) ¢ Va(x) —Valyj,) ¢ Valyj, +1) -Valyi,) 


—1/25 pre + + - 
=n “*(0a(yi, 4) -Ualyi,] + Ja(yi,«1) ~Ji(¥i,): 
Hence, 
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(11) sup |n'/[ V(x) ~Valyi,)I| £2 mag, | Ualyjs1) -Ualys)| + 
+2 may, |n'/[Ja(yju1) ~Je(ys)]I. 
Thus, (9) will follow from (10), (11) and 
(12) P( ma, | Ualyja)-UX(a)| > € €) 0. 
In view of (8), to prove (12), it suffices to show that 
(13) P( mag, wm / (ray? [Ut (yin) - Urs (yi) > © €) 0. 


But, 
én Tn e e id — e e 1/2 
l.h.s.(13) < P P(| OP his [Zi(yjs1) - Zi(yj)] | > can’*). 


Now apply Lemma 2.2b.2 with &; = his [Zi(yjs) - Zi(yj)], Fi-1 = Ai, 7 = 72, 


M=c,a=cen’’,m=n. By the definition of 74, L = 3c7¢7 nl/? 
by Lemma 2.2b.2, 


. Hence 


Th 1/2 
P(|E his [Zi(yjo1)-Zalys)]| > cen/?) ¢ 2 exp[- 24 £ arcsinh(1/6¢)]. 
Since this bound does not depend on j, it follows that 
- 1/2 
I.h.s.(13) ¢ 2€ ' n'/? exp[—2,~ arcsinh (1/6e)] 30. 


This completes the proof of (9) for Uj. As mentioned earlier the proof of (9) 
for U,; is exactly similar, thereby completing the proof of (b)(i). 


Adapt the above proof of (b)(i) with 6;=0 to conclude (b)(ii). Note 
that (b)(ii) holds solely under (A1) and the assumption that G 1s continuous 
and strictly increasing, the other assumptions are not required here. The 
proof of 2) is now complete. 


The claim (3) follows from (1), (b)(ii) above, Lemma A.3 of the 
Appendix and the Cramer—Wold device. Oo 
As noted in the proof of the above theorem, the weak convergence of 


Ub holds only under (A1), (A5) and the assumption that G is continuous 
and strictly increasing. For an easy reference later on we state this result as 
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Corollary 2.2b.1. Let the setup of Theorem 2.2b.1 hold. Assume that 


G is continuous and strictly increasing and that (A1), (A5) hold. Then, Ub 
> a-B(G), where B is a Brownian bridge in €[0, 1], independent of a. O 


, *, 1 
Remark 2.2b.1. Consider the process W(t) := Un(G (t)), 0<t <1. 


Now work with the metric |t—s| 1/2 on [0, 1]. Upon repeating the arguments 
in the proof of the above theorem, modified appropriately, one can readily 
conclude the following 


Corollary 2.2b.2. Let the setup of Theorem 2.2b.1 hold. Assume that 
G is continuous and that (A1), (A5) hold. Then {Un} 3 a-B, where B isa 
Brownian Bridge in C[0, 1], independent of a. o 


Remark 2.2b.2. Suppose that in Theorem 2.2a.2 the r.v.’s_ nj, ..., 
Mon are i.i.d Uniform [0, 1]. Then, upon choosing hp; = ni/ “dni, Cni = 


G (nai), one sees that Uj, = Wa(G), provided G is continuous. Moreover, 
the condition (D) is a priori satisfied, (B) is equivalent to (A1) and (N1) 
implies (A5) trivially. Consequently, for this special setup, Theorem 2.2a.2 
is a special case of Corollary 2.2b.2. But in general these two results serve 
different purposes. Theorem 2.2a.1 is the most general for the independent 
setup given there and cannot be deduced from Theorem 2.2b.1. Oo 


Note: The inequality (5) and its proof appears in Levental (1989). See 
also Proposition 3.1 in Johnson, Schechtman and Zinn (1985). The proof of Theorem 
2.2b.1 has its roots in Levental and Koul (1989) and Koul (1991). It was recently 
generalized by Koul and Ossiander (1992) to include unbounded weights. O 


2.3. ASYMPTOTIC UNIFORM LINEARITY (A.U.L.) OF RESIDUAL 
W.E.P.’s. 


In this section we shall obtain the asymptotic uniform linearity (a.u.l.) of 
residual w.e.p.’s. It will be observed that the asymptotic continuity property 
of the type specified in Theorem 2.2a.1(i) is the basic tool to obtain this 
result. Accordingly let {Xni}, {Foi}, {H} and {Ini} be as in (2.24.33) and 


define 
(1) Sa(t, u) := Yi day I(Xni < H “(t) + cnn), 
pa(t, u) = 24 dni Fai(H -(t) + eniu), 
Yq(t, u) := Sq(t, u) — wa(t, u), 0<t<1, weER?, 


where {Cpi, 1<i<n} are pxl vectors of real numbers. We also need 
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(2a) $3(x, u) := 3; dni (Xn <x + cain), 
p(x, u) == Ei dai Fai(x + eniu), 


Yd(x, u) := Sa(x, u) — pa(x, u), —o<x¢o, ueR. 


Clearly, if H is strictly increasing then Sq(x, u) = Sq(H(x), u). Similar 
remark applies to other functions. 


Throughout the text, any w.e.p. with weights dpi = n i/2 


indicated by the subscript 1. Thus,e.g.,V -0<x<o, we R®, 


will be 


(2b) S(x, u) = n 73; I(Xni <x + Chi u), 
Y4(x, u) = ie a {W(Xni< x+ cniu) — Fai(x + cnyu)}. 
Theorem 2.3.1. In addition to (2.24.34), (N1), (N2), and (C*) assume 


that d.f.’s {Fni,1<i<n} have densities {fni,1<i<n} wrt. A such that 
the following hold: 


(3a) lim, , lim supp As SUP) yl Ks |fni(x) — fai(y)| = 0, 
(3b) max||fnill, <k < o. 
1on @ 


In addition, assume that 


(4) max [lenill = o(1) 

and 

(5) Yi [|dni Cnil] = O(1). 

Then, for every 0< B <a, 

(6) sup |Sq(t, u) — Sq(t, 0) —u Dy dai ena Qni(t)| = op(1), 


where dni:= fii, 1<i<n, and the supremum its taken over 0¢ + <1, 
Jul] < B. 
Consequently, tf H is strictly increasing on R, then 


(7) sup |$9(x, u) — $3(x, 0) —u Ys dai ni fni(x)] = op(1). 


where the supremum is taken over -w < x ¢ o, |lul| < B. 
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Theorem 2.3.1 is a consequence of the following four lemmas. In these 
lemmas the setup is as in the theorem. 


In what follows, MB) = {ucR?; |lul| < B}; sup stands for the 
2U 
supremum over 0<+t<1 and ue J4(B), unless mentioned otherwise. Let 


(8) q(t) == Dy dni Cni dni(t), 
Ra(t, u) = Sa(t, u) ~~ Sa(t, 0) sa w va(t), O<t<1, we RP. 
Lemma 2.3.1. Under (3), (4) and (5), 


(9) sup | a(t, u) — a(t, 0) —u va(t) | = o(1). 


Proof. Let 6, = B max; esl. By (3), {Fi} are uniformly 
differentiable for sufficiently large n, uniformly in 1<i<n. Hence, 


L.h.s. (9) < (44 ||dic;||) max; sup |. y| £6, | fi(x) — fi(y)| = 0(1), 
by (3), (4) and (5). O 
Lemma 2.3.2. Under (N1), (N2), (C*), (3), (4) and (5), V ue MB), 


(10) sup | Ya(t, u) — Ya(t, 0| = op(1). 
0<t<1 


Proof. Fixa ué M(B). The lemma will follow if we show 
(i) Ya(t, u) — Ya(t, 0) = op(1) foreach 0<t <1, 
and 
(ii) WV e>0,andfor b=u or b=0, 


lim. , limsupn P( sup | Ya(t, b) — Ya(s, b)| > €) = 0. 
7 | t-s] <é 


Since Yat. 0) = Wal.) of (2.2a.33), for b = 0, (ii) follows from 
(2.2a.35) of Corollary 2.2a.1. 
To verify (ii) for b=u, take nj = H(X;i — ciu), 1<i<n, in (2.2a.1). 


Then Yq(-, u) = Wa(-) of (2.2a.1) and G;(-) = F\(H {(-) + equ), 1¢i<n. 
Moreover, 
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(11) ee i d? [F;(H “(t+d) + cju) — Fx(H “(t) + cqu)] 
7 < 2Bk max; ||c;l| + sup [La(t+6) —La(t)], by (3) 
0<t<1-6 


= 0(1) as n—+o, then 6— 0, by (4) and (C*). 


Hence (C) is satisfied by the above {Gi}. The other conditions being (N1) 
and (N2) which are also assumed here, it follows that the above {ny} and 
{Wa satisfy the conditions of Theorem mpamng Thus (ii) for b= u 
ollows from Theorem 2.2a.1(i). Hence (ii) is proved. 


To obtain (i), note that the 


Var Ya(t, u) — Ya(t, 0)] < 3; d3 |Fi(H*(t) + eiu) —Fi(H “(t))| 
¢ Bk max; |cil], by (3), 
= o(1), by (4). 
This together with the Chebychev inequality yields (i) and hence (10). o 


To state and prove the next lemma we need some more notation. Let 
Kni = ||Cnil], 1<i<n, and define 


(12) Sa(t, u, b) = Bdni I(Xni< H *(t)+enyueb kai), 
* * 
pa(t, u, b) = E Sq(t, u, b), 
Ya(t, u, b) = Sa(t, u, b) — pa(t, u, b), 0<¢<1, ER, DER 


Ib] < oa (N1), (N2), (C*), (3), (4) and (5), V € > 0, 


(13) lim, limsupn P( sup |Ya(t, u,b) — Ya(s, u, b)| 2 €) = 0. 
” | t-s| <6 
Proof. In Theorem 2.2a.1(i), take 7; = H(X;-cju-bxi), l<i<n. Then 
W.(-) = Ya(-, u, b) and G,(-)= Fi(H /(-)+cju+bx;), 1<i<n. Again, 
similar to (11), 
sup [Ga(t+5) — Ga(t)] < 2k(B+b) max; |/c;|| + sup [La(t+d) — La(t)] 
0<t<1-6 0<¢<1-6 
= 0(1), by (4) and (C*). 
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Hence (13) follows from Theorem 2.2a.1(i). O 


Lemma 2.3.4. Under (N1), (N2), (C*), (3), (4) and (5), V ¢€ > 0 
there is a 6>0 such that for every ve MB), 


(14) lim supn P( del ~ u) — Ra(t, v)| 2 €) = 9, 


t,|ju—v 
where Rq is defined at (8). 
Proof. Assume, without loss of generality, that di>0, 1<i<¢n. 


For, otherwise write d; = di,- di-,1<1i<n, where {dj,, dj +} are as in the 
proof of Lemma 2.2a.4. Then Sd = TaSa+ - giGle Ra = Ta+Ra: - Ta-Ra-, 


where 74, = Yj fa , T4- = Yy (dy-)*. In view of (N1), Tas <1, Ta-< 1. 
Moreover, if {dj satisfy (N2) and (5) above, so do {dis d;-} because 


d?,vd2. = = dz, + d?. = = d?, 1<¢1i<n. Hence the triangle inequality will yield 
(14), if proved for Rg, and Rg. But note that d;,Ad;->0 for all i. 


Now, ||u— v|| < 6 implies 
(15) - SK, + cy < qu < 5K; + Civ, Ki = |lcill, 1<i¢<n. 
Therefore, because d; > 0 for all i, 
(16) Sa(t, v, -6) < Sa(t, w) < Sa(t, v, 6) for all t, 
yielding 
(17) Li(tyu,v):= Sa(t, v, ~6) —Sa(t, ¥) —(a—v) v(t) 
< Ra(t, u) — Ra(t, v) 
¢ Sa(t, v, 6) —Sq(t, ¥) —(u—v) va(t) =: L4(t,u,y). 
We shall show that there isa 6> 0 such that for every vé€ MB), 


(18) P( sup |L;,(t,u,v)| > €) = o(1), j= 1,2. 
t, ||u—v|| <6 


We shall first prove (18) for L2. Observe that 


* * 
(19) | Lo(t,u,v) | ¢ | Ya(t, v, 6) — Ya(t, v, 0)| 


+ *. / 
+ |pa(t, v, 6) — a(t, v, 0) |+|(u—v) va(t)| 
The Mean Value Theorem, (3), and ||u—v|| < 6 imply 
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* * 
(20) sup|pa(t, v, 6) — wat, v, 0)| < 6k ¥} Idicill, 
sup|(u—v) va(t)| < k 63; |Idicill 


Let M(t) denote the first term on the r.h.s. of (19). Le., 


M(t) = Ya(t, v, 6) — Ya(t, v, 0), arer 
(21) Claim: sup|M(t)| = 0/(1). 
To begin with, 
Var (M(t)) < 3; d3 [Fi(H _(t)+c;v+6«;) — F,(H {(t)+c;v)] 
< 6k max; Ki, by (3a), (3b), 
= o(1), by (5). 
Hence 
(22) M(t) =op(1) forevery 0<t<¢ 1. 


Next, note that, fora y> 0, 
* 
sup |M(t)—M(s)|< sup |Ya(t, v, 6) — Ya(s, v, 6)| + 
tee] <7 [ts | <7 


+ + 
+ sup |Ya(t, v, 0) — Yq(s, v, 0)|. 
hte, Oe 


Apply Lemma 2.3.3 twice, once with b = 6 and once with b = 0, to obtain 
that V e€>0, 


(23) lim limsup, P( sup |M(t)—M(s)| > ©) =0. 
128 re 


But (23) and (22) imply the Claim (21). 

Now choose 6>0 so that 
(24) lim supy 6k Yj ||dieil| < €/3. (use (5) here). 
From (19), (20) and (21) one readily obtains 
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lim supp P( sup |La(t,u,v)| > €) < lim supy P(sup |M(t)| > €/3) = 0. 
5 


t, |j/a—vil < 


This prove (18) for LL». A similar argument proves (18) for L; with the 
same § as in (24), thereby completing the proof of the Lemma. O 


Proof of Theorem 2.3.1. Fixan ¢€ >0 and choosea 6> 0 satisfyin 
(24). By the compactness of 4(B) there exist points vj, ..., vr in ip} 
such that for any u€ M(B), ||u—vjl| < 6 forsome j= 1, 2,...,r. Thus 
lim supn P( sup |Ra(t, u)| 2 €) 
2uU 


r 
<Y limsup, P( sup |Ra(t, u) — Ra(t, vj)| > €/2) 
ir tslla—vsll<d 


T 
or lim supn P(sup; | Ra(t, vj)| > €/2) =0 


by Lemmas 2.3.2 and 2.3.4. o 
Remark 2.3.1. Upon a reexamination of the above proof one finds 
that Theorem 2.3.1 is a sole consequence of the continuity of certain w.e.p.’s 
and the smoothness of {Fni}. Note that the above proof does not use the 
full force of the weak convergence of these w.e.p.’s. o 
Remark 2.3.2. By the relationship 
Ra(t, u) = Ya(t, u) — Ya(t, 0) + wa(t, a) — a(t, 0) -—u va(t) 
and by Lemma 2.3.1, (6) of Theorem 2.3.1 is equivalent to 


(25) sup _| Ya(t, u) — Ya(t, 0)| = 05(1). 
0<t<1, ueW(B) 


This will be useful when dealing with w.e.p.’s based on ranks in Chapter 3. o 
The above theorem needs to be extended and reformulated when 


dealing with a linear regression model with an unknown scale parameter or 
with M-estimators in the presence of a preliminary scale estimator. To that 


end, define, forx,s€R, 0<t<1, ue R’, 
(26) Sa(s, t, w) := Yi dnl(Xni < (1esn//?)H4(t) + cain), 
$9(s, x, u) := Yi dadl(Xni < (1esn /)x + ena), 


and define Yq(s, t, u), a(s, t, u) similarly. We are now ready to prove 
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Theorem 2.3.2. In addition to the assumptions of Theorem 2.3.1, 
assume that 


(27) MaXi,n SUDx |x fni(x) | < k < Oo. 
Then 
(28) sup |Sa(s, t, u)-Sa(0, t, 0)-¥; dai{sn //2H4(t)+eniu}ani(t)| = op(1). 


where the supremum is taken over |s| < b, ueMB),0<t <1. 
Consequently, if H 1s strictly increasing for all n > 1, then 


(29) sup|$3(s, x, u) - $9(0, x, 0) - Edy; {sn //x + eniu}fas(x)| = op(1). 
where the supremum is taken over |s|< b, ueM(B) and xeR. 


Sketch of proof. The argument is quite similar to that of Theorem 
2.3.1. We briefly indicate the modifications of the previous proof. 
An analogue of Lemma 2.3.1 will now assert 


sup | pa(s, t, u) - pa(1, t, 0) - {(n /*Sdiqs(t)H *(t))s + u va(t)}] = 0(1). 
This uses (3), (4), (5), (27) and (N1). 


An analogue of Lemma 2.3.2 is obtained by applying Theorem 


2.2a.1(i) to i := H((Xs - ciu)on_), l<i<n, On := (1+sn~2/ ay This states 
that forevery |s| <b andevery ueJ(B), 


(30) sup | Ya(s, t, u) — Ya(s, t, 0)| = op(1). 
0<t<1 


In verifying (C) for these {7:}, one has an analogue of (11): 


sup [Ga(t + 6) — Ga(t)] 
0<t<1-6 


< 2k{B max; ||c;l] + bn 2/ a + sup [La(t+5) — La(t)]. 
0<t<1-6 


Note that here Gq(t) = di d3Fi(onH 1 (t)+c;u). 


One similarly has an analogue of Lemma 2.3.3. Consequently, from 
Theorem 2.3.1 one can conclude that for each fixed se[—b, b], 


(31) sup |Ra(s, t, w)| = op(1), 
o<t<1, |ful|sB ° 
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where R,(s, t, 2 equals the l.h.s. of (28) without the supremum. To 
complete the proof, once again exploit the compactness of |[—b, b] and the 
monotonic structure that is present in Sq and pg. Details are left for 
interested readers. o 


Consider now the specialization of Theorems 2.3.1 and 2.3.2 to the 
case when Fy,i=F, F ad.f.. Note that in this case (N1) implies that Lg(t) 
=t so that (C*) is a priori satisfied. To state these specializations we need 
the following assumptions: 


(F1) F has uniformly continuous density f{ w.r.t. X. 
(F2) {> 0, ae. A. 
(F3) SUP eR |xf(x)| <k <o. 


Note that (F1) implies that f is bounded and that (F2) implies that 
F is strictly increasing. 

Corollary 2.3.1. Let Xn, ...., Xnn 0e2.4.d. F. In addition, suppose 
that (N1), (N2), (4), (5) and (F1) hold. Then (6) holds with qni = {(F"*). 

If, in addition, (F2) holds, then (7) holds with fp; = f. o 

Corollary 2.3.2. Let Xnj,...., Xnn 0e12.4.d. F. In addition, suppose 
that (N1), (N2), (4), (5), (F1) and (F3) hold. Then (28) holds with H =F 

gpl 

and qniz={(F °). 

If, in addition, (F2) holds, then (29) holds with fp; = f. o 

We shall now apply the above results to the model (1.1.1) and the 
{Vj}-processes of (1.1.2). The results thus obtained are useful in studying 
the asymptotic distributions of certain goodness-of-fit tests and a class of 


M-estimators of § of (1.1.1) when there is an unknown scale parameter also. 
We need the following assumption about the design matrix X. 


(NX) (x x)? exists, > p; max; xni(X X)? Xni = O(1). 
This is Noether’s condition for the design matrix X. Now, let 
(32) A=(X x)? D := XA, 


q (t) == (ani(t), -.., dan(t)), A(t) := diag(a(t)), 
T(t) := AX A(t) XA, [.(t):=n /*H Yt) D'a(t),  o¢t<1. 
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Write D = ((dij)), 1<i<n, 1<j<p, and let d,j,) denote the jth column of D. 
Note that D D = Ipxp. This in turn implies that 


(33) (N1) is satisfied by d,j) forall 1<j<p. 
Moreover, with a,j) denoting the jth column of A, 
(34) max; di; = max; (xia j) \ < max; x (xia, j, 1 
= max; Xj (3, aj) jy) xi 
= max; x; (X X)x;=0(1), by (NX). 
Let 
(35) L(t) = 3 43; FH “(t)), 0<t<1, 1¢j¢p. 


We are now ready to state 


Theorem 2.3.3. Let {(xni, Yni), 1¢i< n}, B {Foi, 1<i <n} beas 
in the model (1.1.1). In addition, assume that {Fri} satisfy (3a), (3b) and 
that (C*) ts satisfied by each L; of (35), 1<j<p 


Then, for every 0< B <a, 


(36) sup ||A{V(H ‘(t), 6+ Au) — V(H ‘(t), §)} —T(t)ul] = 0,(1). 
where the supremum is over 0 < t <1, we MB). 


If, in addition, H 1s strictly increasing for all n > 1, then, for every 
0<B<a, 


(37) = sup ||A{V(x, 6+ Au) — V(x, 6)} — Ts(H(x))ul| = op(1). 
where the supremum is over -w < x < o, u€ M(B). 

Theroem 2.3.4. Suppose that {(xni, Yni), 1<i<n} and BeR? obey 
the model 


with {éni} independent r.v.’s having d.f. ee Assume that (NX) holds. 
In addition, assume that {Fni} ath 3h), 3b), (27) and that (C*) is 
satisfied by each Lj of (35), 1<j< 


Then for every 0< b, B <a, 
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(39) sup || A{V(aH “(t), 8+Auy)- V(~H “(t), 6)}-Ti(t)u-Ta(t)vl] = op(1), 


1/2 


where v:=n0 (a-7)7 7, a> 0, and the supremum is over 0<t<1, ues(B) 


and |v| < b. 
If, in addition, H 1s strictly increasing for every n> 1, then 


(40) sup||A{V(ax, # + Auy)- V(x, §)} - T(H(x))u - P2(H(x))v|] = op(1). 


1/2 


where v:=N0 (a-)7 1, a> 0, and the supremum is over -0<x<o, ue MB) 


and |v| <b. 


Proof of Theorem 2.3.3. Apply Theorem 2.3.1 to X; = Yi — x;B, 
cq,=xj;A, 1<i<n. Then F,; is the df. of Xj and the jth components of 


AV(H (i, f-Au) and AV(H tt), Bf) are Sg(t, u), Sa(t, 0) of (1), 
respectively, with dj = dij, 1<i<n, 1<j<p. Therefore (36) will follows by p 
applications of (6), one for each d,j), provided the assumptions of Theorem 
2.3.1 are satisfied. But in view of (33) and (34), the assumption (NX) 
implies (N1), (N2) for d,j), 1< j< p. Also, (4) for the specified {c;} is 
equivalent to (NX). Finally, the C-S inequality and (33) verifies (5) in the 
present case. This makes Theorem 2.3.1 applicable and hence (36) follows. o 


Proof of Theorem 2.3.4. Follows from Theorem 2.3.2 when applied to 


Xi = (Yi- xif)7 ¢, C; = x;A, 1<i<n, in a fashion similar to the proof of 
Theorem 2.3.3 above. o 


The following corollaries follow from Corollaries 2.3.1 and 2.3.2 in the 
same way as the above Theorems 2.3.3 and 2.3.4 follow from Theorems 2.3.1 
and 2.3.2. These are stated for an easy reference later on. 

Corollary 2.3.3. Suppose that the model (1.1.1) with Fni =F holds. 


Assume that the design matrix X and the d.f. F satisfying (NX) and (F1). 
Then, V 0< B <a, 


(41) sup | A{V(F ‘(t), s)- V(F ‘(t), A} —£(F “(t))A “(s — B)|| = op(1). 


where the supremum is over 0<¢¢ <1; 8 €R’, A “(8 — ff)|| < B. 
If, in addition, F satisfies (F2), then 


(42) sup ||A{V(x, 8) — V(x, 6)} — f(x) A '(s— A)|| = 0p(1). 


where the supremum is over -w < x < 0; 8 € R?, A 2(s — f)|| < B. O 
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Corollary 2.3.4. Suppose that the model (38) with Fni =F holds and 
that the design matriz X and the d.f. F satisfy (NX), (F1) and (F3). Then 
(39) holds with 
(43) Ty(t)=4(F “(t))Ipxp, To(t)=F ((t) £(F ((t))AX 1, O0<t<1. 


If, in addition, F satisfies (F2), then (40) holds with T;(H) =T;(F), j 
= 1, 2. Le., 


(44) sup | A{V(ax, B x Au7) _ V(x, A)} a {(x)u = xf(x)v|| = op(1), 
where the supremum is over -o<x<w; ueMB) and |v|<b, with v as in (39). o 


We end this section by stating an a.u.l. result about the ordinary 
residual empirical processes Hy, of (1.2.1) for an easy reference later on. 


Corollary 2.3.5. Suppose that the model (1.1.1) with Fni =F holds. 


Assume that the design matriz X and the d.f. F satisfying (NX) and (F1). 
Then,V 0<B<oa, 


(45) sup|n’/"{H,(F"'(t), s) —Ha(F “(t), A} - 
-{(F-'(t))-n 1/75; xnj A-A 1(s - B)| = 09(1), 
where the supremum is over 0<¢t<1;s€R, A ‘(s - B)|| < B. 
If, in addition, F satisfies (F2), then, V 0< B <a, 
(46) sup|n’/?{H,(x, s)-Hn(x, 6)} - f(x)-n 1/73; xnjA-A 1(s-f)| = 0p(1). 
where the supremum is over -w< x <w; SER’, A -(s - Bll < B. 


Proof. The proof follows from Theorem 2.3.1 by specializing it to the 


case where dy; = n 1/2 and the rest of the entities as in the proof of Theorem 
2.3.0. Oo 


Note: Ghosh and Sen (1971) and Koul and Zhu (1991) have proved an almost sure 
version of (42) in the case p= 1 and p> 1, respectively. oO 


2.4. SOME FURTHER PROBABILISTIC RESULTS FOR W.E.P.’S. 


For the sake of general interest, here we state some further results about 
w.e.p.’s. To begin with, we have 
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2.4.1. Laws of the iterated logarithm: 


In this subsection, we assume that 


(1) dni = di, Mni = Ni, Gni = Gi, 1< 1 ¢ n. 
Define 
(2) a(t) = Bdi {Imi <t)-—Gilt)}, oh = 3 di, 
= 1= 
Ea(t) = %(t)/{202 en tn o2}/? n>1, 0<t<1. 


Let r(s,t) := sAt — st, 0<s,t< 1, and H(r) be the reproducing kernel 
Hilbert space generated by the kernel r with ||-||, denoting the associated 
norm on H(r). Let 


(3) K= {fe H(r); |[fllr < 1}. 
Theorem 2.4.1. If m1, 72, ... are t.i.d. uniform on [0,1] and dy, da, 
. are any real numbers satisfying 
tn tn on _ 
Ahn = 0, 
o 


n 


(a) lim, o4 =o, lima (max. di) 
then 
P( a2 (fn, K) 0 and the set of limit points of {fn} is K)=1. o 


Theorem 2.4.1 was proved by Vanderzanden (1980, 1984) using some 
of the results of Kuelbs (1976) and certain martingale properties of €n. 


Theorem 2.4.2. Let 71, 72, .... be independent nonnegative r.v.’s. Let 
{di} be any real numbers. Then 


lim supp SUP, , g;'| Un(t-)| <o, a.s.. O 


A proof of this appears in Marcus and Zinn (1984). Actually they 
prove some other interesting results about w.e.p.’s with weights which are 
r.v.’s and functions of t. Most of their results, however, are concerned with 
the bounded law of the iterated logarithm. They also proved the following 
inequality that is similar to, yet a generalization of, the classical 
Dvoretzky-Kiefer-Wolfowitz exponential inequality for the ordinary 
empirical process. Their proof is valid for triangular arrays and real r.v.’s. 


Exponential inequality. Let Xn 1, Xno, ..., Xnn be independent r.v.’s 
with respective d.f.’s Fyy,..., Fun and {dni} be any real numbers satisfying 
(N1). Then, V A>0,V n> 1, 
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(4) Pipe |S dni{l(Xni ¢ x) — Fni(x)}] > A) < [1+(80)!/2aJexp(-a?/8). 0 
x|<m 3°! 


The above two theorems immediately suggest some interesting 
probabilistic questions. For example, is Vanderzanden’s result valid for 
nonidentical r.v.’s init? Or can one remove the assumption of nonnegative 
{ni} in Theorem 2.4.1! 


2.4.2. Weak convergence of w.e.p.’s in D[0, 1], in q -metric and an 
embedding result. 


Next, we state a weak convergence result for multivariate r.v.’s. For this we 


revert back to triangular arrays. Now suppose that mi € (0, 1]?,1<i<n, 
are independent r.v.’s of dimension p. Define 


(5) Wa(t) = 3 duifl(nai < t)—Gni(t)},  ¢ € [0, 1). 


Let Gnij be the jth marginal of Gni, 1<i¢n, 1¢j<p. 


Theorem 2.4.3. Let {mi, 1<i< n} be independent p-variate r.v.’s 
and {dni} satisfy (N1) and (N2). Moreover suppose that for each 1< j< p, 


n 


: : 2 
lim; 9 lim supn SUP Oc 441-5 Ye dni{ Gnij(t+é) — Gnij(t)} = 0. 
Then, for every «> 0 


(i) lim, , lim supp P( sup |Wa(t) — Wa(s)| > €) = 0. 
= s—t | <6 


(ii) | Moreover, Wa 3 some W on (D[0,1]?, 2) if, and only if, for each 
s, t€ [0, 1]°, Cov(Wa(s), Wa(t)) — Cov(W(s), W(t)) =: C(s, t). 


In this case W is necessarily a Gaussian process, P(WeC(0, 1]°) = 1, 
W(0) = 0 = W(1). O 


Theorem 2.4.3 is essentially proved in Vanderzanden (1980), using 
results of Bickel and Wichura (1971). 

Mehra and Rao (1975), Withers (1975), and Koul (1977), amon 
others, obtain the weak convergence results for {W4}-processes when oe 
are weakly dependent. See Dehling and Taqqu (1989) and Koul and 
Mukherjee (1992) for similar results when {7ni} are long range dependent. 

Shorack (1979) proved the weak convergence of Wa/q-process in the 
a@-metric, where q € 9, with 
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Q := {q; q a continuous function on [0, 1], q > 0, q(t) = q(1-+t), q(t) f 
and t /2q(t) | for 0<t< 1/2, f 'q 2(t)dt < a}. 
Theorem 2.4.4. Suppose that n1, .---) Mun are independent r.v.’s in 
[0, 1] with respective d.f.’s Gn, ..-,Gnn such that 
n ‘y Gui(t)=t, O<t<1. 
In addition, suppose that {dni} satisfy (N1) and(B). Then, 


(Gi) Vero, VqeQ, 
lim rt, up os LO 4] > €) = 0. 
lea] qt 


(ii) q ‘Wa > q ‘W, Wa continuous Gaussian process with 
covariance function C if, and only if Cqg — C. Oo 


Shorack (1991) and Einmahl and Mason (1991) proved the following 
embedding result. 


Theorem 2.4.5. Suppose that mn, .-.-) Man are t.1.d. Uniform [0, 1] 
r.v.’s. In addition, suppose that {dyi} satisfy (N1) and that 


n n 4 
D> dni = 0, nd dni = O(1). 


i=1 


Then on a rich enough probability space there exist a sequence of versions Wa 
of the processes Wq anda fized Brownian bridge B on {0, 1] such that 


y a(t) - B(t)| 


sup —_—_—_—_.— = O,(1), forall 0< v <1. 
t/nst<i-1/n {t(1-t)p 4/2” 


The closed interval 1/n < t < 1-1/n may be replaced by the open interval 
min{1nj;1< j<n} <t < max{mj; 1 <j <n}. O 
2.4.3. A martingale property. 

In this subsection we shall prove a martingale property of w.e.p.’s. Let Xny, 


Xn, ..-, Xnn be independent real r.v.’s with respective d.f.’s Fai, ..., Fan; 
ni) - des be real numbers. Let a < b be fixed real numbers. Define, 


Ma(t) := 3 dus{I(Xni € (a, t] — pni(a, t]} {1 — pni(a, t1’, 
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Ra(t) = 2B duifl(Xas € (t, b] —pni(t, b]} {1—pni(t, BI}, t eR, 
where 
Pni(s,t] := Fni(t)—Fni(s), O<s<¢t¢1, 1<i¢n. 


Let T,C [a, o), T2C (—o, b] be such that M,(t) [Rx(t)] is well—defined 
for t€ T, [t € Ta]. Let 


Fin(t) := o-field {I(Xni € (a, s]), a<s<t, i=1,...,n}, te Ty, 
Fon(t) ;: 


o—field {I(Xni € (8, b]), t<s<b, i=1,..,n}, te To 


Martingale Lemma. Under the above set up, for each n> 1, {M,(t), 


Fin(t), t € TA is a martingale and {Ryp(t), Fon(t), t € To} ts a reverse 
martingale. 


Proof. Write q;(a, s| = 1 — p;(a, s]. Because {X;} are independent, 
for a<s<t 


E{M,(t)|Fin(s)} 
= difai(a,t]} CK: €(a, 8]) E{((X: €(a, t]) — pila, t]) | Xi e(a, 8} 
+ (Xi ¢(a, s]) E{(I(Xi €(a, t]) — pla, t]) | Xi ¢(a, s]}}] 
¥ di {ai(a, t)} W(X: €(a, sJai(a, t] + 


+ 1(X: ¢(a, s] (PRES — pi(a, t}} 


n — 
A similar argument yields the result about Rp. qo 


Note. In the case {Xni} are i.i.d. and dpi = Ae this Lemma is well 
known. In the case {Xnj} are i.i.d. and {dpi} are arbitrary, the observation 
about {Mp} being a martingale first appeared in Sinha and Sen (1979). The above 
Martingale Lemma appears in Vanderzanden (1980, 1984). 

Theorem 2.4.1 above generalizes a result of Finkelstein (1971) for the 
ordinary empirical process to w.e.p.’s of i.i.d. r.v.’s.. In fact, the set K is 
the same as the set K of Finkelstein. o0 


CHAPTER 3 


LINEAR RANK AND SIGNED RANK 
STATISTICS 


3.1. INTRODUCTION 


Let {Xni, Fni} be as in (2.2.33) and {eni} be px1 real vectors. The rank 
and the absolute rank of the ith residual are defined, respectively, as 


n , , 
(1) Rin =o I(Xnj —U Cnj < Xni— Cni), 


n , , 


Let y be a nondecreasing real valued function on [0, 1] and define 
n 
(2) Tua(y, u) = dni Y(Rin/(n+1)), 


TT _ a +/pt+ . Pp 
aly, u) = Ps dni Y (Riy/(n+1)) §(Xni -u Cni); uc, 


where y'(s) = y((s+1)/2), 0<8<1,and s(x) = I(x > 0) - I(x < 0). 


The processes {Tg(y, u), we R?} and {TG(y, u), we R} are used to 
define rank (R) estimators of # in the linear regression model (1.1.1). See, 


e.g., Adichie (1967), Koul (1971), Jureckova (1971) and Jaeckel (1972). One 
key property used in studying these R-estimators is the asymptotic uniform 


linearity (a.u.l.) of Ta(y, u) and Ta(y, u) in ueM(B). Such results have 
been proved by Jureckova (1969) for Tag(y, u) for general but fixed 
functions y, by Koul (1969) for Tg(J, u) (where J is the identity function) 


and by Van Eeden (1971) for Tg(y, u) for general but fixed gy functions. 
In all of these papers {Xy;} are assumed to bei.id.. 


In Sections 3.2 and 3.3 below we prove the a.u.l. of Ta(y, .), Ta(y, -), 
uniformly in those gy which have ||yllty < o, and under fairly general 
independent setting. These proofs reveal that this a.u.l. property is also a 
consequence of the asymptotic continuity of certain w.e.p.’s and the 
smoothness of {F);}. 

Besides being useful in studying the asymptotic distributions of 
R-estimators of # these results are also useful in studying some rank based 
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minimum distance estimators, some goodness-of-fit tests for the error 
distributions of (1.1.1) and the robustness of R-estimators against certain 
heteroscedastic errors. 


3.2. ASYMPTOTIC UNIFORM LINEARITY OF LINEAR RANK 
STATISTICS 


At the outset we shall assume 


(1) ye @:= {y: [0,1] —R, ye DZ(0, 1), with || ylly := (1) — (0) = 1}. 
Define the w.e.p. based on ranks, with weights {dpi}, 


(2) Za(t, u) = Yi dni (Riu < nt), O<t<l1, ue RP. 
Note that 
(3) Ta(y, u) = f ont/(n+1)) Za(dt,u) 
n 
= —f Za((n+1)t/n, u) dy(t) + ndp 91), nd =» dni. 

The representation (3) shows that in order to prove the a.u.l. of Ta(y, .), it 
suffices to prove it for Zg(t, .), uniformly in 0 < t <1. Thus, we shall first 
prove the a.u.l. property for the Zq-process. Define, for xR, 0<t<1, ueR?, 

H,d(t) = inf{x; Hau(x) > t}, Hy (t) = inf{x; Hy(x) > t}. 


Note that Ho is the H of (2.2a.33). Weshall write H, for Hno. 
Recall that for any d.f. G, 


G(G{(t))>t, 0<t<1 and G \(G(x))<x, xeR. 
This fact and the relation nHyy( Xj — C; u) = Riy yield that V 0<t <1, 


(5) [X;—c;u> Hpa(t)] 9 [Riu > nt] 9 [Xi—cju2> Haat), 1<i¢n 


For technical convenience, it is desirable to center the weights of linear 
rank statistics appropriately. Accordingly, let 


(6) Wni := (dni — dn), 1<i<n. 
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Then, with Z, denoting the Zq when weights are {wy;}, 


Za(t, u) = Zy(t, a) + dy - [nt], 0<t<1,ueR?. 
Hence 
(7) Z(t, u) — Za(t, 0) = Zy(t, u) — Zw(t, 0), O0<t<1,ueR?. 
Next define, for arbitrary real weights {dy}, 
(8) W(t, wu) = DVdni U(Xni- Caius Hyd (t)), 0<t <1, ue R. 


By (5) and direct algebra, for any weights {dy;}, 
(9) sup |Za(t, u)— H(t, u)| < 2 max; | dj]. 
9U 
Consider the condition 
(N3) re ——ai max; We; — 0. 
In view of (7) and (9), (N3) implies that the problem of proving the a.u.l. for 
the Zg-process is reduced to proving it for the %-process. 
Recall the definitions in (2.3.1) and define 
(10) Ta(t, u) = a(t, u) ~~ pa(t, u), O<t<li, ue RP. 


Note the basic decomposition: for any real numbers {dni} and for all 
0<t<1, uweR?, 


(11) T(t, u) = Ya(HHal(t), u) + wa(HHat(t), u) — a(t, u), 


provided H is strictly increasing for all n > 1. Decomposition (11) is basic 
to the following proof of the a.u.l. property of Zq. 


Theorem 3.2.1. Suppose that {Xni, Fni} satisfy (2.2a.34), (N3) holds, 
and {cyi} satisfy (2.3.4) and (2.3.5) with dni = Wni. In addition, assume 
that (C*) holds with dni = Wni, H 1s strictly increasing, the densities {fni} 
of {Fni} satisfy (2.3.3b), and that 


(12) lim, , lim supn max; sup | fni(x) — fai(y)| = 0. 
|H(x) -H(y)[<é 
Then, for every 0< B <a, 


(13) sup |To(t, u) — Yu(t, 0) — fe(HHna(t), 0) + p(t, 0)| = op(1) 
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where the supremum is being taken over 0<¢t <1, ue RP. 

Before proceeding to prove the theorem, we prove the following lemma 
which is of independent interest. In this result, no assumptions other than 
independence of {Xy;} are being used. 


Lemma 3.2.1. Let H, Hn, Hy and Hny be as in (4) above. Assuming 
only (2.24.34), we have 


(14) [Hn — Hil 0 as.. 


If, in addition, (2.3.4) holds and if, for any 0< B <a, 


(15) sup _|H(x) — H(y)| 0, (mn = max; ||c;\|), 
[x—y | <2mpB 

then, 

(16) su |Hnu(x) — Hu(x)] 0 as.. 


[x] <o ‘Thall<B 


Proof. Note that H,(x) — H(x) is a sum of centered independent 


Bernoulli r.v.’s. Thus E[Hp(x) — H(x)]* = O(n’). Apply the Markov 
inequality with the 4th moment and the Borel-Cantelli lemma to obtain 


|Hn(x) — H(x)| 0, as., forevery xeER. 


Now proceed as in the proof of the Glivenko-Cantelli Lemma (Loéve (1963), 
p.21) to conclude (14). 


To prove (16), note that ueé/(B) implies that -myB < cj us myB, 1 
<i<n. The monotonicity of Hny and Hy yields that for ue/(B), xeR, 


H,(x-Bm,) - H(x-Bm,) + H(x-Bm,) - H(x+Bm,) 
¢ Hnu(x) - Hu(x) 
< Hy(x + Bmy) - H(x + Bmy) + H(x + Bmy) - H(x - Bm,). 
Hence (16) follows from (15) and the following inequality: 
l.h.s. (16) < 2 be |Hn(x) -—H(x)| + sup |H(x)—H(y)|. O 
x 


o x—y | <2mpB 


Proof of Theorem 3.2.1. From (11), for all 0<¢t <1, we R, 
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Ty(t,u) = [Yw(HHna(t), u) — Ye(HH,4(t), 0] 
+ [Yo(HHna(t), 0) — Yo(t,0)} 
+ Yu(t, 0) —[ww(t, w) — w(t, 0) —u H9(t)] 
+ [4e(HHnu(t), u) — Ho(HHna(t), 0) — a v(HHna(t))] 
+ be(HHnu(t), 0) — we(t, 0) + w [%9(HHna(t)) — v9(t)]. 
Therefore, 
l.h.s. (13) < sup] Ya(t, u) — Yo(t, 0)| + sup| Ye(HHna(t), 0) — Y(t,0)| 
+ 2 sup | j4m(t, u) — y(t, 0) — uw y(t) 
+ sup |u [%(HHnu(t)) — vw(t)) | 


(17) = A,;+ Ag+ A3+ Ag, Say, 


where, as usual, the supremum is being taken over 0<¢ t <1, ue€MB). In 
what follows, the range of x and y over which the supremum is being taken is 
IR, unless specified otherwise. 

Now, (2.3.3b) implies that |H(x) — H(y)| < |x-y| k. This and (2.3.4) 
together imply (15). It also implies that 


sup |x—y| <6 | fni(y) = fni(x) |< sup | H(x) -H(y) | kd | fnily) = fni(x)| ; 


for all 1<i¢n and all 6>0. Hence, by (12), it follows that {f,;} satisfy 
(2.3.3a). Now apply Lemma 2.3.1 and (2.3.25), with dai = wni, 1 <1 <n, to 
conclude that 

(18) A; = 0,(1), j= 1,3. 


Next, observe that 
(19) sup|HHna(t) —t] < sup |Hnu(x) —Hu(x)| + sup |Hu(x)-H(x)| + 27’, 


sup|Hu(x) — H(x)| ¢ sup|H(x + my B) — H(x— mp B)|. 
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Hence, in view of (19) and Lemma 3.2.1, we obtain 
(20) sup |HHna(t)—t| 0, as.. 
9U 


(We need to use the convergence ' probability ay 
Now, fixa 6>0 andlet Bo = [sup | HHza(t) —t] < é]. By (20), 


(21) lim supp P((B?)°) = 0. 


Now observe that Yq(., 0) = Wa(.) of (2.2a.33). Hence, with A» as 
in (17), for every 7 > 0, 


(22) lim supp P(| Ao] > 7) < limsup, P( sup |Wy(t)—We/(s)| > 7, BY). 
[t-s|<é 


Upon letting 6—+0 in (22), (2.2a.35) implies 
Next, we have 


(24) lim, 9 lim supn sup |[v%#(t) — v9(s)| 
[t—-s | <6 


<lim, , lim supp max; sup —_ | fni(y) — fni(x) | (2: || wic:ll) 
|H(x) -H(y) | <é 


= 0, by (12) and (2.3.5). 
From (24) and (21) one obtains, in a fashion similar to (23), that 
(25) Ag = 0,(1). 
This completes the proof of the theorem. Oo 


From a practical point of view, it is worthwhile to state the a.u.l. result 
in the i.i.d. case separately. Accordingly, we have 


Theorem 3.2.2. Suppose that Xyj,..., Xnn are 1.1.d. F. In addition, 
ra gic that (F1), (F2), (N3), (2.3.4) ond (2. 3.5) with dni = Wni hold. Then, 
V0<B<a, 


(26) sup \Za(t, u) — Za(t, 0) —u Yi Wnitni q(t)| = op(1), 
0<t<1, |[ul]<B 
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(27) sup _|Ta(y, u) — Ta(y, 0) + u Si wni ni fq dy| = 0p(1). 
ye G,|| ul] <B 


where q=1(F ’). 
Proof. Let p= wniCni. From (7), 
(28) I.h.s. (26) = sup |Zw(t, u) — Zw(t, 0) — u pa(t)). 
»U 
Take Fy; =F in Theorem 3.2.1. Then (F1) and (F2) imply that q is 
uniformly continuous on [0, 1] and ensure the satisfaction of all assumptions 


pertaining to F in Theorem 3.2.1. In addition, py(t, 0) = 0, 0< t <1. 
Thus, Theorem 3.2.1 is applicable and one obtains 


sup |Ty(t, u) — Ya(t, 0)| = op(1) 
which in turn yields 
(29) sup |Ty(t, u) — Tw(t, 0)| = op(1). 
From (10) and (28), 

l.h.s. (26) < sup{ | Zn(t, u)— &K(t, u)| + |Zy(t, 0) — H(t, 0)| + 
+ |To(t, a) —Tw(t, 0)] + |aa(t, u) —u p a(t) } 
= 0,(1), 

by (9), (10), (N3), (29) and Lemma 2.3.1 applied to Fai =F, dni = Wni. 


To conclude (27), observe that 


l.h.s.(27) < sup {|Za(t, u) — Za(t, 0) — up q(t) | 
2uU 
+ |u p| [a((n+1)t/n) — a(t)|} 
_ Op(1), 
by (26), the uniform continuity of q and (2.3.5) with dni = wni. D 


Remark 3.2.1. Theorem 3.2.2 continues to hold if F depends on n, 
provided now that the {q} are uniformly equicontinuous on (0, 1]. o 
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Remark 3.2.2. An analogue of Theorem 3.2.2 was first proved in Koul 
(1970) under somewhat stronger conditions on various underlying entities. 


In Juretkova Ser one finds yet another variant of (27) for a fixed but a 
fairly general function y and with p in cy; equal to 1. Because of the 
importance of the a.u.l. property of Ta(y, .), it is worthwhile to compare 


Theorem 3.2.2 above with that of Jureckova’s Theorem 3.1 (1969). For the 
sake of completeness we state it as 


Theorem 3.2.3. (Theorem 3.1, Jureckova (1969)). Det Xy4,..., Xnn be 
1.1.4. F. In addition, assume the following: 


(a) F has an absolutely continuous density f£ whose a.e. derivative f 
satisfies 
0<Kf)<o, Kf) := f (#/f"aF. 
(b) {wni} satisfy (N3). 
(c) 1. X(Cni — En)? <M <o (recall here cyi is 1«1) 
n 
2. max(Cni— Cn) = 0(1), Ch =n a Cni- 
j= 
(d) isa nondecreasing function on (0,1) with 


f(a) -@y dt > 0, B= f° ou)du. 


or (dni—dnj)(Cni— nj) ¢ 0, V 1<i,j¢n. 


Then, V 0< B<oa, 


(f) yee u) — Ta(y, 0) + u 3 wni Cni b(Y,f)| = op(1) 
where b(y,f):=—f (F(x) f(x) dx ' 


The strongest point of Theorem 3.2.3 is that it allows for unbounded 


score functions, such as the "Normal scores" that corresponds to y = ® A ® 
being the d.f. of N(0, 1) r.v.. However, this is balanced by requiring (a), (cl) 
and (e). Note that (b) and (cl) together imply (2.3.5) with dni = wni, 1¢1i 
<n. Moreover, Theorem 3.2.2 does not require anything like (e). 
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Claim 3.2.1. (a) implies that { is Lip(1/2). 


First, from Hajek — Sidak (1967), pp 19-20, we recall that (a) implies 
that f(x)-+0 as x ++. Now, absolute continuity and nonnegativity of 
f implies that 


li(x) 4) < f" I(E/D) LAF, ey 
Therefore, by the Cauchy—Schwarz inequality, for x < y, 
(i) |f(x) — f{y)| <{ ih %(£/£)°dF - [F(y) — F(x)? 
(ii) ¢ F(A. 


Letting y— o in (ii) yields 
(iii all, < (0). 
Now (i) and (iii) together imply 
Ji(x) — f(y) < PH) £f” a(4) at}? < 19) (yxy? 
A similar inequality holds for x > y, thereby giving 
jA(x) — ay) <7) lye", Vy eR, 


and proving the claim. Consequently, (a) implies (F1). 
Note that f can be uniformly continuous, bounded, positive a.e., yet 
need not satisfy If) < . For example, consider 
f(x) := (1—x)/2,0<¢x<¢1 
r= (x—2j+1)/2i+2, 2}-1< x <¢ 2j 
= (2j+1—x)/2i+2, 2j< x < 2j+1, j> 1; 
{(x) := f(-x), x <0. 


The above discussion shows that both Theorems 3.2.2 and 3.2.3 are 
needed. Neither displaces the other. If one is interested in the a.u.l. 
property of, say, Normal scores type rank statistics, then Theorem 3.2.3 gives 
an answer. On the other hand if one is interested in the a.u.l. property of, 
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say, the Wilcoxon type rank statistics, then Theorem 3.2.2 provides a better 
result. 


The proof of Theorem 3.2.3 uses contiguity and projection technique a 
la Hajek (1962) to approximate Ta(y, "} for each fixed u. Then condition 
(e) implies the monotonicity of Ta(y, .) which yields the uniformity with 
respect to u. Such a proof is harder to extend to the case where u and Cp; 


are pxl vectors; this has been done by Jureckova (1971). 

The proof of Theorem 3.2.2 exploits the monotonicity inherent in the 
w.e.p.’s Yq and certain smoothness properties of F. It would be desirable 
to extend this proof to include unbounded y. Oo 


We now return to Theorem 3.2.1 with general {Fyi}. We wish to state 
an a.u.l. theorem for {Zg} and {Ta(y, .)} under general {F,;}. Theorem 
3.2.1 still does not quite do it because there is u in jwy-expressions. We 
need to carry out an expansion of these terms in order to recover a term that 
is linear in u. To that effect we have 


Lemma 3.2.2. In addition to the assumptions of Theorem 3.2.1, suppose 
that 


(30) n ¥/? ¥: Hlensl] = 0(1). 
Then, V 0< B<oao, 


(31) sup [n’/2(HH A(t) —t) + Yi(t, 0) + u v4(t)| = 09(1) 
0<t<1, |[ul|<B 


where Yj, 4, etc. are Yq, vg of (2.3.1), (2.3.8) with dni = n 1/2. 
Consequently, | 


(32) sup [n’/*(H H(t) —t)| = 0,(1). 
0<t<1, ||ul|<B 


Proof. Write Y,(-), wi(-) for Yi(-, 0), wi(-, 0), respectively. Let I 
denote the identity function and set Anu := nl/ *(HnuHlaa —I). Then, 


(33) n!/?(HHaa -1) =n /?(HHoa - Hulu + HuHnu ~ Houle) + Anu 
= - [y;(HHna, u) - #(HHnu) - u ;(HHny)] + Anu 
7 u [vi(HH nu) 7 AN = u VY - Y; 
- [Y\(HHnu, u) - ¥i(HHnu)] - [Yi(HHnu) - Yi. 
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-1/2 


Now, note that sup | Anu(t)| <n “/°. Hence 
a | 


(34) sup|n/?(HHaa(t) —t) + Yi(t) + w i(t)| 
< sup| s(t, a) — w(t) — u v,(t)| +B sup||vi(HHnu(t)) — v,(t)]| 
+ sup] Yi(t, u) — ¥i(t)| + sup| Wi(HHR4(t)) — Wi(t)], 


where we have used the fact that Y,(t) = W(t) of (2.2a.33). The first 
term on the r.h.s. of (34) tends to zero by Lemma 2.3.1 when applied with 


“1/2. ‘The third term tends to zero in probability by (2.3.25) applied 


with dpi =n 1/2 To show that the other two terms go to zero in 
probability, use Lemma 3.2.1, (2.24.35) and an analogue of (24) for 1, and 
an argument similar to the one that yielded (23) and 3 above. Thus we 
have (31). Since supt,u | Yi(t, 0) + w’(t)| = Op(1), (32) follows. O 


dnizn 


Lemma 3.2.3. In addition to the assumptions of Theorem 3.2.1 and 
(30), suppose that for every 0<k <a, 


(95) mary SUP gD [Lni(t) — Lass) ~(t-s)f0i() = op( 1) 


= = =, _741Nn 
where Lyi := Fni H - fni := fni(H 1) /h(H 2) l<i¢n wthh:=n "2, Tat 
_71Nn 
Moreover, suppose that, with w(t) :=n 2 Wai fai(t), 0¢< + <1, 
1= 


(36) n'/?| @(t)| = O(1). 


aU Ott 


Then, V0< B <a, 
(37) sup | fe(HHna(t)) — p(t) + {¥i(t) + w 14(t)}n’/? w(t)| = 09(1) 


where p(t), Yi(t) stand for p(t, 0), Yi(t, 0), respectively, and where the 
supremum ts being taken over 0< t <1, u EB). 


Proof. Let My := [e( HH 4) — fw. From (32) it follows that V ¢« > 0 
4d Ke and Nje such that 


(38) P(A;) >1—.«, n > Nie, 
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where 
A£ = [suptyu |HHad(t) —t] < Ken 2/7). 


By assumption (35), there exists Noe such that n> Noe implies 


(39) max; ' nig ae n/?11.5(t) — Li(s) — (t-8)&i(s)| < e. 


Define 
76; = {Li(HHas) —Li—[HHpg —7] G}I(AS), 1 <i<a. 
In view of (39) and (38), 


(40) P(max; Supt,u nl =| Zui(t)| > €) < «, n > NyeVNoe =: Ne. 
Moreover, 
(41) My = My I(An) + My I((An)°) 


= Dy wi 26; + Zoo + o/?taHt — 1) a? w, 


where 
Zéo = {My— n(n, —1]-n)/? & } 1((AS)9). 
Note that 
(42) P(suptsu | Zuo| # 0) < P((An)‘) < «, n> Ne. 
By the C-S inequality, (N3) and (40), 
(43) P(supt,u |Li wi Zui(t)| > €)<e¢, n> Ne. 
Hence, (37) follows from (43), (42), (41), Lemma 3.2.2 and (36). O 


We combine Theorem 3.2.1, Lemmas 3.2.2 and 3.2.3 to obtain the 
following 


Theorem 3.2.4. Under the notation and assumptions of Theorem 3.2.1, 
Lemmas 3.2.2 and 3.2.3,¥V 0<B<oa, 


(44) sup|Za(t, u) - Za(t, 0) - w Ys (dni - da(t)) eni ani(t)| = op(1), 
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(45) sup|Ta(y, u)-Ta(y, 0) +0 fi (dni- dn(t))ens ani(t) dy(t)] = op(1), 
where the supremum in (44) is over 0<t<1, ||ul|<B, in (45) over ye & ||ul|<B, 
and where di(t) =n! Yi dni 4i(t), qni:= f,;(H ‘(t)), 0<t<1,1<i<¢n. 


Proof. Let p(t) := ¥; (di - d(t))ciq;(t). Note that the fact that 


a (1 178) saad (0) p(t) = dy (wi - W(t))ciqi(t), where {w;} are as 


(46) Lhs.(44)= sup |[Ze(t, u) —Zw(t, 0)—u p(t)| 
0<t<1, |lal|<B 
<4maxi|wil + sup | Kt, u)— K(t, 0)—u p(t)]. 
0<t<1, |lull<B 


Now, from Theorem 3.2.1 and Lemma 3.2.3, uniformly in 0<t<1, |{ul|<B, 
(47) sup|T y(t, u) — Yo(t) + {¥i(t) + v(t) uf n'/? w(t)| = op(1), 
where Yq(t) stands for Yq(t, 0) for arbitrary weights {dni}. Therefore, 
sup | %(t, u)— %(t, 0) —u p(t)| 
= sup |Tr(t, a) —Tw(t, 0) + pa(t, w) — x(t, 0)—w p(t) | 
< sup | T(t, u) — Ta(t, 0) + u v,(t) n’/? w/(t)| 
+ sup | g(t, 1) — pre(t, 0) — a Hq(t)| = op(1), 


by (47) and Lemma 2.3.1 and the fact that p(t) = v»(t)—v,(t) n’/? w(t). 
This completes the proof (44). The proof of (45) follows from (44) in the 
same fashion as does that of (27) from (26). D 


Remark 3.2.3. As in Remark 2.2a.3, suppose we strengthen (N3) to 
require 


(B1) n max; Wei = O(1), r = 1. 
Then (C*) and (36) are a priori satisfied by Ly. o 
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Remark 3.2.4. If one is interested in the i.i.d. case only, then Theorem 
3.2.2 gives a better result than Theorem 3.2.4. oO 
3.3. A.U.L. OF LINEAR SIGNED RANK STATISTICS 


In this section our aim is to prove analogs of Theorems 3.2.2 and 3.2.4 for the 


+ 
signed rank processes {Tg(y, u), ucR?}, using as many results from the 
previous sections as possible. Many details are quite similar. Define, for 


ucR’, 0<t<1,x>O0, 
(1) Z(t, u) = Dy dni I(Ryy < nt) s(Xni - cain), 
Jnu(x) := n ‘3; I(|Xni - cai <x) = Hnu(x) - Hna(-x), 
Ju(x) := n? >A [Fni(x+Cn iu) - Fni(-x+Cn iu) = H,(x) - H,(-x), 
H(t, u) := Dy dnil(|Xni - cna] < Ina(t)) s(Xni - cna), 
Si(t, u) = D dai 1(|Xni - eniu| < J “(t)) s(Xni - enin), 
pa(t, u) := 34 dni bni(t, u)=E Sa(t, u), 
pax(t, u) = Fai(J {(t)+engu) + Fni(-J “(t)+caqu) - 2Fai(enyu), 1si<n. 


In the above and sequel, J and Jn stand for Jo and Jno, respectively. We also 
need, ee 


(2) Ya(t, u) := Sa(t, u) — pact, u), 
and 
(3) Ta(t, u) = %4'(t, u) — pa(t, u), O<t¢1, wER? 


Analogous to (3.2.11), we have the basic decomposition: For 0<t<1, uéR?, 
(4) Ti(t, u) = Ya(TJnu(t), u) + wa(IInu(t), u) — a(t, u), 
Now, note that, w.p. 1, forall 0<t<1, we R?, 


(5) Ya(t, u) = Ya(HJ ‘(t), u) + Ya(H(-J ‘(t)), u) —2 Ya(H(0), u), 


where Yq is as in (2.3.1). Therefore, by Theorem 2.3.1 (see (2.3.25)), under 
the assumptions of that theorem and strictly increasing nature of J and H, 
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(6) suptsu | Ya(t, w) — Y4(t, 0)| = op(1). 


One also has, in view of the continuity of {Fy;}, a relation like (5) between 
vq and pg. Thus by Lemma 2.3.1, under the assumptions there, 


(7) suptsu | Ha(t, u) — pa(t, 0) — a vA(t)| = o(1) 
where 
(8) v(t) == U dui eni [fai(J “(t)) + fa3(-J “(t)) — 2fp3(0)], O< t <1. 
We also have an anlogue of Lemma 3.2.1: 
Lemma 3.3.1. Without any assumption except (2.2a.34), 
(9) SUP KC | Jn(x) —J(x)] 0 as. 
If, in addition, (2.3.4) and (3.2.15) hold, then 


(10) \Jnu(x) —Ju(x)| 0 a.s.. 


Su 
0<x<o, Hall <p 


Using this lemma, arguments like those in Theorem 3.2.1 and the above 
discussion, one obtains 


Theorem 3.3.1. Suppose that {Xni, Fni} satisfy (2.2a.34), (2.3.3b) and 
that {dni, Cni} satisfy (N1), (N2), (2.3.4) and (2.3.5). In addition, assume 
that 


(11) lim, , lim supp max; sup | fni(x) — fai(y)| = 0 
[J (x)-J(y) | <é 
and that H 1s strictly increasing for every n. Then, for every 0< B< a, 
(12) sup ‘| T4(t, u)- Ya(t, 0)-~4(IIna(t), 0) + u4(t, 0)| =0,(1). oO 
0<t<1, |[al|<B 


We remark here that (11) implies (3.2.12). 
Next, note that if {F;} are symmetric about 0, then 


(13) pa(t,0)=0, O<t<1, n>1. 


Upon combining (13), (12) with (7) one obtains 
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Theorem 3.3.2. In addition to the assumptions of Theorem 3.2.1, 
suppose that {Fyi,1<i<n} are symmetric about 0. 
Then, for every 0< B <a, 


(14) su |Za(t, a) - Za(t, 0) — a Yi dnicni vii(t)| = op(1), 
0<t<1, ||al|<B 


(15) sup |Ta(y, u)-Ta(y, 0) + u Yi daitni f ve (t) dy’(t)| = op(1), 
ye ul <B 


where 
vii(t) = 2ffa(J (t)—fai(0)], 1<i¢<n, O¢t¢l. 


Proof. Using a relation like (3.2.5) between Riy and Jnu, one 
obtains, as in (3.2.9), 


(16) sup |Za(t, u) — W(t, u)| ¢ 2 max; |d;| = o(1), by (N2). 
2U 


Thus (13) follows from (16), (12), (11) and (7). Conclusion (15) follows from 
(13) in the same way as (3.2.27) follows from (3.2.26). o 


Because of the importance of the i.i.d. symmetric case, we specialize 
the above theorem to yield 


Corollory 3.3.1. Let F bead, symmetric around zero, satisfying 


(F1), (F2) and let Xy4,..., Xnn be iad. F. In addition, assume that {d 
Cni} satisfy (N1), (N2), (2.3.4) and (2.3.5). Then, for every 0 < B <a, 


(17) 


ni, 


su |Za(t, u) — Z4(t, 0) —u Yi dni eni q*(t)| = 09(1), 
o<ec1, Hall <B 


(18) sup | T3(y,u) — T4(y,0) + Bi daieni uf a°(t)dy"(t)| = 0, (1), 
ye 6ue M(B) : 


ju 


where q*(t) :=2[f(F ((t+1)/2)) - £(0)], 0<¢ <1. o 


Remark 3.3.1. Van Eeden (1972) proved an analogue of Soe without 
the supremum over y, but for square integrable y’s. She also needs 
conditions like those in Theorem 3.2.3 above. Thus Remark 3.2.1 is equally 
applicable here when comparing Corollory 3.2.1 with Van Eeden’s results. o 


Now, we return to Theorem 3.3.1 and expand the pq-terms further so 
as to recover an extra linearity term. Define, for 0<+< 1, ue R’, 
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ok = = 
(19) Ya(t, u) := Dy dnifl(|Xni—eni ul < J *(t)) — Fi(J {(t))} 
va(t) := 3 dni ni [fai(J*(t)) — fai(-J*(t))] 
where 
Fiu(x) := Fai(x + ciu) —Fai(—x+eiu), x20. 
Note the relation: For arbitrary {dp}, 
(20) Ya(t, u) = Ya(HJ *(t), u) — Ya(H(-J “(t)), u). 
From (20) and (2.3.25) applied with dyjj = n 2 we obtain 


(21) sup | Yi(t, w) — Yi(t, 0)| = op(1). 


Note that in the case dai =n 2/? , (2.3.5) reduces to (3.2.30). 
Next, under (11) and (2.3. 5), just as (3.2.24), 


* + 
(22) lim, , lim supn sup |va(t) — va(s)|| = 0, 
[t-s| <é 


for the given {dpi} and for dyi= n 2/ . 
Using (21), (22) and calculations similar to those done in the proof of 
Lemma 3.2.2, we obtain 


Lemma 3.3.2. Under the conditions of Theorem 3.2.1 and (3.2.30) 


(23) suptsy |n/2(JTad(t) —t) + Yi(t, 0) + u v4(t) | = 0p(1). 
Consequently, 
(24) suptsy |n/2(JSad(t) —t)] = Op(1). oO 


Similarly arguing as in Lemma 3.2.3, we obtain the following Lemma 
3.3.3. Init pa(t), wi(t) etc. stand for pa(t, 0), p3(t, 0) etc. of (1). 


Lemma 3.3.3. In addition to the assumptions of Theorem 3.2.1, (3.2.30) 
assume that for every 0<k <a, 


(25) maxi sup, n/” |pni(t) — pni(s) — (t-s)&ui(s)| = 0(1) 
[t—s] <kn 
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where {ti} are as in (1), 
(26) &ix(s) := [fni(J“(s)) — fni(-J*(s))] / b*(T“(s)), 0<8<1, 
h*(x) :=n ~ Ys [fa3(x) — fai(—x)], x>0. 


Moreover, with dz(t) :=n "Yi dpifti(t), 0<t <1, assume that 


1/2 4 
(27) SUP yc, [n/? dn(t)] = O(1). 
Then, 
+/7771 + i — 1/2 4+ 
(28) sup|ud(JInu(t)) - wa(t) + {¥i(t)+u vi(t)}n’” da(t)| = op(1), 
where the supremum is taken over the set 0<t<1, |lull<B. o 


Finally, an analogue of Theorem 3.2.3 is 


Theorem 3.3.3. Under the assumptions of Theorem 3.3.1, (3.2.30), (25) 
and (27), for every 0< B <a, 


(29) sup | ZA(t, u) — Z(t, 0) —u [vi(t) —v4(t)n!/? dX(t)]| = op(1), 
0<t<1, || al[<B 


(30) sup |TH(p,u) - TH(y,0) + u' f ed(t)-r5(t)n"/75(t)] dy*(t)| = op(), 


where the supremum in (30) is over ve G ||ul|<B. O 


Remark 3.3.2. Unlike the case in Theorem 3.2.3, there does not appear 


* ~ 
to be a nice simplification of the term vg —v; ni/2 dj. However, it can be 
rewritten as follows: 


vi(t) — vi(t)n!/? at(t) = 3; dies [f(T 4(t)) + £(-J2(t)) - 26;(0)] 
+ Di (di - da(t)) 3 [fi(J“(t))- £(-J“(t))]- 


This representation is somewhat revealing in the following sense. The first 
term is due to the shift u’c; in the r.v. X; and the second term is due to the 
nonidentical and asymmetric nature of the distribution of X;,1<¢1<¢ n. Oo 


Remark 3.3.3. If one is interested in the symmetric case or in the 1.i.d. 
symmetric case then Theorem 3.3.2 and Corollary 3.3.1, respectively, give 
better results than Theorem 3.3.3. o 
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3.4. WEAK CONVERGENCE OF RANK AND SIGNED RANK W.E.P.’S. 


Throughout this section we shall use the notation of Sections 3.2 — 3.3 with 


u=0. Thus, eg., Za(t), Za(t), etc. will represent Zq(t, 0), Za(t, 0), etc. of 
(3.2.2) and (3.3.1) ie. for O<¢< 1, 


(1) Za(t) = ¥; dni I(Rni < nt), Za(t) = Yi dni (Rai < nt )s(Xni), 
W(t) =Vidnil(Xni< Hn (t)), a(t) = Ya dai Lni(t), 


where Ba (Rui) is the rank of Xni(|Xnil) among Xnt, ..., Xnn (|Xnil, 


cory nn 


We shall first prove the asymptotic normality of Zq and Zq fora 
fixed t, say t =v, 0<v<1l. To begin with consider Zgq(v). In the following 
theorem v is a fixed number in (0, 1). 


Theorem 3.4.1. Suppose that {Xn;}, oe Ini}, La are as in 
gene and (2.24.34). Assume that {dni} satisfy (N1), (N2) and that H 
1s strictly increasing for each n. Also assume that 


(2) lim, , lim supn [La(v + 6) — La(v — §)] = 0, 


and that there are nonnegative numbers fni(v), 1<i<n, such that for every 
0<k<o, 


(3) max; sup _, n!/?11,,3(t) — Lni(v) — (t-v)fai(v)| = 0(1). 
Jt—s] <kn 


Denoting 
(4) da(v):=n 13; daifai(v), o9(v) := i (dni - da(v))*Ena(v)(1-Lni(v)), 


assume that 


(5) n'/?) d,(v)| = O(1). 
(6) lim inf, o4(v) > 0. 
Then, 


{oa(v)} “{Za(v) — wa(v)} — N(Q, 1). 


The proof of Theorem 3.4.1 is a consequence of the following three 
lemmas. In these lemmas the setup is the same as in Theorem 3.4.1. 
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Lemma 3.4.1. Under the sole assumption of (2.24.34), 
a1 
SUP ce c4 |HH, (t) —t| = op(1). 
Proof. Upon taking u = 0 in (3.2.19), one obtains 


sup |HH, (t)—t|< sup |H»(x)—H(x)| +n =o,(1), 
0<t<1 —o¢x< +o 


by (3.2.14) of Lemma 3.2.1. o 


Lemma 3.4.2. Let Yg(t) denote the Ygq(t, 0) of (2.3.1). Then, under 
(3), for every «> 0, 


lim, , lim supn P(_ sup | Ya(t) — Ya(v)| > €) =0 
[t—v| <é 


Proof. Apply Lemma 2.2a.2 to mai = H(Xni), Gai = Ini, to obtain 
that Yq =Wgq of that lemma and that 


P(_ sup | Ya(t)— Ya(v)| > €) 
[t-v] <é 


< Ke “[La(v + 6) —La(v — 6)]* + P(| Ya(v — 6) — Ya(v)| > €/2) 
+ P(| Ya(v + 6) — Ya(v — 6)| > &€/4) 
(K+ 20)e 7 [La(v + 6) —La(v — 8], (by Chebyshev). 
The Lemma now follows from the assumption (3). o 
Lemma 3.4.3. Under (3), for every «> 0, 
lim supp P(| Yqa(HH,‘(v)) — Ya(v)| > «) =0. 
Proof. Follows from Lemmas 3.4.1 and 3.4.2 . O 


Remark 3.4.1. Lemmas 3.4.2 could be deduced from Corollary 3.3.1 
which gives the tightness of the process Yq under stronger condition (C*). 
But here we are interested in the behavior of Yq only in the neighborhood of 
one point v and the above lemma proves the contnuity of Yq at the point 
v at which (3) holds. Similarly, many of the approximations that follow 
could of course be deduced from proofs of Theorems 3.2.1 and 3.2.2. But 
these theorems obtain results uniformly in 0<t<1 under rather stronger 
conditions than would be needed in the present case. Of course various 
decompositions used in their proofs will be useful here also. Oo 
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Proof of Theorem 3.4.1. In view of (3.2.9) and (N2), it suffices to prove 
that {oa(v)} + Ta(v) — N(0, 1), where 


(7) Ta(v) = { %(v) — wa(v)}- 
But, from (3.2.11) applied with u = 0, 
Ta(v) = Ya(HHn (v)) + va(HHa'(v)) —pa(v), wp. 1. 
(8) = Ya(v) + op(1) + wa(HHn'(v))—ya(v), _ dy (6). 


Apply the identity (3.2.33) with u = 0 and Lemma 3.4.3 with dj=n 2/? to 
obtain, 
(9) n/"(HH,'(v) — v] = —Y\(HHn‘(v)) + op(1) = —Yi(v) + op(1). 


Since Yj,(v) a N(0, v(1 — v)), | Yi(v)| = Op(1). Again, argue as for 


(3.2.37) with u=0, t =v (i.e., without the supremum on the l.h.s. and with 
u = 0, t = v), to conclude that 


(10) pa(HH'(v)) — wa(v) = —¥i(v) n/? d(v) + op(1). 
Combine (9), (10) to obtain 
(11) Ta(v) = Ya(v) —n1/? d(v) Y¥i(v) + 09(1) 


= 3 (dni — d(v)) {I(Xni § H™"(v)) — Las(v)} + op(1). 


The theorem now follows from (6) and the fact that {ca(v)}" -{leading term 
in the r.h.s. of (11)} | N(0,1) by the L-F CLT, in view of (N1) and (N2).o 


Remark 3.4.2. If {Fyi} have densities {fy;3} then 4i(v) can be 


taken to be fni(H ‘(v))/h(H “(v)), just as in (3.2.34). However, if one is 
interested in the asymptotic normality of linear rank statistic corresponding 
to the jump score function, with jump at v, then we need {Lyi} to be 
smooth only at that jump point. 

The above Theorem 3.4.1 bears strong resemblance to Theorem 1 of 


Dupac—Hajek (1969). The assumptions (N1), (N2) and (4) correspond to 
(2.2a), (2.13) and (2.2a2) of Dupat-Hajek. Condition (3) above is not quite 


comparable to condition (2.12) Dupat-Hajek but it appears to be less 
restrictive. In any case, (2.12) and (2.13) together imply the boundedness of 
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{éi(v)} and hence the condition (5) above. Taken together, then, the 
assumptions of the above theorem are somewhat weaker than those of 


Dupac-Hajek. On the other hand, the conclusions of the Dupac-Hajek 
Theorem 1 are stronger than those of the above theorem in that it asserts not 


only {Za(v) — pa(v)}oa-(v) 3 N(0,1) but also that Eloa'(v)(Za(v) — 


pa(v))}|’ — 0, for r= 1, 2,asn— . However, if one is only interested in 
the asymptotic normality of {Za(v)} then the above theorem appears to be 
more desirable. Moreover, in view of the decomposition (3.2.11), the proof 
presented below makes the role played by conditions (3) and (4) clearer. 

The assumption about H_ being strictly increasing is not really an 
assumption because, without loss of generality, one may assume that {Fj} 
are not flat on a common interval. For, if all {F;} were flat on a common 
interval, then deletion of this interval would not change the distribution of 
Rj, ..., Rn and hence of {Zg}. D 


Next, we turn to the asymptotic normality of Za(v). Again, put u = 0 
in the definition (3.3.1) to obtain, 


(12) ¥4*(t) = Bi dni (|Xni] < In (t))s(Xni), 

pni(t) = Fai(J“(t)) + Fai(-J“(t)) — 2Fni(0), 1<i¢n 

Si(t) = Yi dail(|Xai] < J “(t))s(Xni), 0< t <1, 

pa(t) = Sidnipmi(t), O<t <1. Ya = Sq — ya. 
Like (3.2.9), we have 
(13) UP yccy |Za(t) — %"(t)| < 2 max; [di]. 
Because of (N2), it suffices to consider %" only. Observe that 

Ya(t) = Ya(HJ “(t)) + Ya(H(-J“(t))) - 2¥a(H(0)), 
where Yq is as in (2.3.1). Rewrite 
(14) Y4(t) = {Ya(HJ “(t)) — Ya(H(0))} — { Ya(H(0)) — Ya(-J"“(t))} 


* * 
<< Yai(t) _ Ya2(t), Say. 


This representation motivates the following notation as it is required in the 
subsequent lemma. Let p; := F;(0), qi:=1—pi and define for 0< t <1, 
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(15) Li,(t) := {FJ “(t)) —pid/ai, qi > 0, 
= 0, qi = 0; 

Lio(t) = {pi — Fi(-J “(t))}/pi, pi> 0, 

= 0, pi=0; 1<1i¢n. 


Observe that yi(v) = aiLii(v) — pilio(v), 1<i<n. Also define 
(16) Li(t) := qil4,(t) + pil i.(t) = P(|Xa] < J 4(t)), 1<i¢n, 


L3,(t):= Yi d4 qi Li(t), Li(t):= 3) d2 piLt(t),  0<t <1. 


Argue as for the proof of Lemma 2.2a.2 and use the triangle and the 
Chebychev inequalitites to conclude 


Lemma 3.4.4. For every «€>0 and 0<v<1 /fized, 


* * 
P( sup | Yaj(t) — Yaj(v)| > €) 
[t-v]| <é 


(17) € («+ 20)e * (Li(v + 6)-Li(v-5], j=1,2 
where « does not depend on e€, 6 or any other underlying quantities. o 
Theorem 3.4.2. Let Xp, ..., Xnn be independent r.v.’s with respective 


continuous d.f.’s Fy, ..., Fon and dni, ---» dnn be real numbers. Assume 
that {dni} satisfy (N1), (N2). In addition, assume the following. 


With {Lqj} as in (16), for v fired in (0, 1), 
(18) lim, , lim supa |Laj(v + 6) —Laj(v- 6] =0, j=1,2. 


(19) There exist numbers {Gj(v), 1<i<n; j= 1, 2} such that for all 
0<k<o, j=l, 2, 


max; sup, n/*|L§i(t) — Lij(v) — (t — v)Gii(v)| = o(1). 
Jt—v] <kn- 
With 
(20) dav) = m “3; dnifaidli(v) — pstia(v)}, 
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7(v) := Bi {dai[Lna(v)-{uni(v)}"] + (da(v))? Lii(v) (1-L;(v)) - 
— 2dni da(v) uni(v) (1-Lai(v))}, 


(21a) lim inf, r°(v) > 0. 

(21b) lim supn n’/?| d#(v)| <0. 
Then, 

(22) {r(v)}"“[Za(v) — xa(v)] — N(0, 1) 
where pq is as in (12). 


Proof. The proof of this theorem is similar to that of Theorem 3.4.1 so 
we shall be brief. To begin with, by (13) and (N2) it suffices to prove that 


{r(v)} + Tov) — N(0, 1), where Ta(v) := %4"(v) — na(v). 
Apply Lemma 3.4.1 above to the r.v.’s |Xn;|, .... |Xnn], to conclude 


sup |J(Jn-(t)) —t| = op(1). 
0<t<1 

From this, (14), (17) and (18), 

Ta(v) = Ya(JIn'(v)) + wa(IIn (v)) — Ha(¥). 

= Ya(v) + [Ha(JIn'(v)) — na(v)] + op(1). 
Again, apply arguments like those that yielded (9) to {|Xni|} to obtain 
a *x 

n’/215351(v) —v] = —Yi(v) + 0)(1), 

where Y;(v) is as in (3.3.19) with t =v and u=0. Consequently, 


T4(v) = Y4(v) —n/?7dX(v)¥ i(v) + op(1) = K4(v) + op(1) 


where 
Ki(v) = Ya(v) — 2)/7a%(v) Vi(v) 


= Yi {dnifl(J(|Xnil) < v) s(Xni) — univ) 
- da(v)[I(J(|Xnil) < v) — Lai(v)]}. 
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Note that Var(Ka(v)) = 7 ?(v). The proof of the theorem is now yotege 
by using the L-F CLT which is justified, in view of (N1), (N2), and (21a). 


Remark 3.4.3. Observe that if {F;} are symmetric about 0 then 
ui=0=d, and r’(v) =; 02; Lii(v). o 


Remark 3.4.4. An alternative proof of (22), using the techniques of 


Dupaé and Hajek (op. cit.), appears in Koul and Staudte, Jr. (1972a). — 
comments like those in Remark 3.4.1 are appropriate here also. 


Next, we turn to the weak convergence of {Za} and {Za}. These 
results will be stated without proofs as their proofs are consequences of the 
results of the previous sections in this chapter. 


Therorem 3.4.3. (Weak convergence of Zq). Let Xn, ..-. Xnn be 
independent r.v.’s with respective continuous d.f.’s Fy, ..., Fon. With 
notation as in (2.2a.33), assume that (N1), (N2), (Ct) hold. In addition 
assume the following: 


(23) There are measurable functions {fi, 1 < i<n} on (0, 1], such 
that for all 0 <k <a, 


maxi sup _), 0” |Lui(t) — Lni(s) — (t-8)ax(6)| = 
[t—s|<kn- : 


Moreover, assume that 


: 1/2,4 
(24) lim supn SUP occ 2 / |dn(t)| <o, 
(25) lim, , lim supn sup n’/*|d,(t)—da(s)| = 0, 
[t-s] <6 
(26) lim inf, o°(t)>0, O<t<1. 


Finally, with Ka(t) := Yi (dni — da(t)){I(Xni < H “(t)) — Lni(t)}, assume 
that 


(27) C(t, s) = limy Cov(Ka(t), Ka(s)) 
= limy ¥4 (dni - da(t))(dni- dn(s))Lni(s)(1 - Lni(t)), 
exists forall O<¢s<t<1. 


Then, Za — lia >» to a mean zero, covariance C continuous Gaussian 
process on [0, 1], tied down at 0 and 1. O 
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Remark 3.4.5. In (23), without loss of generality it may be assumed 
that n ‘3; fni(s) = 1, 0¢8<¢1. For, if (23) holds for some {&3, 1<i< n}, 
then it also holds for {¢ni, 1<i<n}, 49i(s) = n//7[Lai(sen 1/7) - Lai(s)], 
1<i<¢n, 0¢s<¢1. Because ny Lni(s) = 8, n ‘3; &;3(s) = 1. oO 


Remark 3.4.6. Conditions (C*), (N1) and (24) may be replaced by the 
condition (B), because, in view of the previous remark, 


Remark 3.4.7. In the case Fy; have density fp;, one can choose 
lai = fni(H *)/n “3; fnj(H +), 1<i¢n. o 
Remark 3.4.8. In the case Fy; a continuous and strictly 


increasing d.f., Lni(t) = t, 4ni(t) = 1, so that “c¥) and (23) — (26) are trivially 
satisfied. Moreover, C(s, t), = s(1 -t), 0<s<t <1, so that (27) is satisfied. 


Thus Theorem 3.4.3 includes Theorem V.3.5.1 of Hajek and Sid4k (1967). a 


Theorem 3.4.4. (Weak convergence of Za): Let Xn, ..-, Xnn Oe 
independent r.v.’s with respective d.f.’s Fy, .... Fun and let doc: > dnn de 
real numbers. Assume that (N1) and (N2) hold and that the following hold. 


(28) With La; as in (16), 


lim, lim supn sup [Laj(t + 6) —Laj(t)} =0, j=1,2. 
7 0<¢<1—6 


(29) There are measurable functions Gj, 1<i<n, j=1,2 on (0, I] 
such that for any 0<k <a, 


maxi sup _,, n'/?|L4(t) - L4i(s) ~ (¢ - 8)4%(6)1 = 0(2). 
Jt-s| <kn-/? 


(30) With dj as in (20), 


lim sup, sup n/?|d*(t)] <a, 
0<¢<1 


(31) lim, , lim supn up n!/?1G*(t) — d#(s)| =0. 
t—-s|<é 
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(32) With r’ asin (20), 


liminfy r°(t)>0, O<t<1. 
(33) With KQ(t) as in the proof of Theorem 3.4.2, 
lim, Cov(Ka(s), Kq(t)) = C"(s,t) exists, O<s<t<1. 


+ + ‘ ° : 
Then, Za — ta > to a continuous mean zero covariance C’ Gaussian 
process, tied down at 0. Oo 


Remark 3.4.9. Remarks 3.4.5 through 3.4.7 are applicable here also, 
with appropriate modifications. Oo 


7 Remark 3.4.10. Suppose that Fy; =F, F continuous, and dpi = n/ 
Then 


sup |Za(t) —ud(t)| = sup n’/?|{Hy(x) —Hy(0)} — {Hu(0) — Hn(-x)} 
0<t<1 0<x<o 
— {F(x) — F(0)} — {F(0) — F(-x)}| 


which is precisely the statistic Th proposed by Smirnov (1947) to test the 
hypothesis of symmetry about F. Smirnov considered only the null 
distribution. Theorem 3.4.4 allows one to study its asymptotic distribution 
under fairly general independent alternatives. 

If {dpi} are arbitrary, subject to (N1) and (N2), then 


sup{ | Za(t) — pa(t) |; 0 <t <1} may be considered a generalized Smirnov 
statistic for testing the hypothesis of symmetry. o0 


CHAPTER 4 


M, RAND SOME SCALE ESTIMATORS 


4.1. INTRODUCTION 


In the last three decades statistics has seen the emergence and consolidation 
of many competitors of the Least Square estimator of # of (1.1.1). The 
most prominent are the so-called M- and R- estimators. ‘The class of 
M-estimators was introduced by Huber (1973) and its computational aspects 
and some robustness properties are available in Huber (1981). The class of 
R-estimators is based on the ideas of Hodges and Lehmann (1963) and has 


been developed by Adichie (1967), Jureckova (1971) and Jaeckel (1972). 

One of the attractive features of these estimators is that they are 
robust against certain outliers in errors. All of these estimators are 
translation invariant, whereas only R-estimators are scale invariant. 

Our purpose here is to illustrate the usefulness of the results of 
Chapter 2 in deriving the asymptotic distributions of these estimators under 
a fairly general class of heteroscedastic errors. Section 4.2a gives the 
asymptotic distributions of M-estimators while those of R-estimators are 
given in Section 4.4. Among other things, the results obtained enable one to 
study their qualitative robustness against an array of non-identical error 
d.f.’s converging to a fixed error d.f. The sufficient conditions given here are 
fairly general for the underlying score functions and the design variables. 

Efron (1979) introduced a general resampling procedure, called the 
bootstrap, for estimating the distribution of a pivotal statistic. Singh (1981) 
showed that the bootstrap estimate By, is second order accurate, i.e., 
provides more accurate approximation to the sampling distribution Gy, of 
the standardized sample mean than the usual normal approximation in the 
sense that sup{|G,(x) — Bn(x)|; x€R} tends to zero at a faster rate than 
that of the square-root of n. This kind of result holds more generally as 
noted by Babu and Singh (1983, 1984). 

Section 4.2b discusses similar results pertaining to a class of 
M-estimators of # when the errors in (1.1.1) are iid.. It is noted that 
Shorack’s (1982) modified bootstrap estimator and the one obtained by 
resampling the residuals according to a w.e.p. are second order accurate. 

In an attempt to make M-estimators scale invariant one often needs a 

reliminary robust scale estimator. Two such estimators are the MAD 
median of absolute deviations of residuals) and the MASD (median of 
absolute symmetrized deviations of residuals). The asymptotic distributions 
of these estimators under heteroscedastic errors appear in Section 4.3. 

In carrying out the analysis of variance of an experimental design or a 
linear model based on ranks one needs an estimator of the asymptotic 
variance of certain rank statistics, see, e.g., Hettmansperger (1984). These 
variances involve the functional. Q(f) = ff dy(F) where gy is a known 


(al 
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function, F a common error d.f. having a density f. Some estimators of 
Q(f) under (1.1.1) are presented in Section 4.5. Again, the results of 
Chapter 2 are found useful in proving their consistency. 


4.2. M-ESTIMATORS 
4.2a. First Order Approximations: Asymptotic Normality 


This subsection contains the asymptotic distributions of M-estimators of f 
when the errors in (1.1.1) are heteroscedastic. The following subsection 4.2b 
gives some results on the bootstrap approximations to these distributions. 

Let the model (1.1.1) hold. Let w be a nondecreasing function from 


R to R. The corresponding M-estimator A of f is defined to be a zero of 
the M-score {yy) V(dy, t), where V is defined at (1.1.2). Our objective 


is to investigate the asymptotic behavior of AT(A — J) when the errors in 
(1.1.1) are heteroscedastic. Our method is still the usual one, v.i.z., to 


obtain the expansion of the M-score uniformly in te {t; |A “(t-p)II < B}, 0 
< B < ao, to observe that there is a zero of the M-score, A, in this set and 


then to apply this expansion to obtain the approximation for Av(A — f) in 
terms of the given M-score at the true #. To make all this precise, we need 
to standardize the M-score. For that reason we need some more notation. 
Let 


(1) A'(y) := diag(faily), ----) fan(y)), yeR, 
C := AX f A'(y) dy) XA, 
T(y, t):-=-C7A f Wy)V(ay, +), 

T(y, t):= A ‘(t—f) —T(y, p), te RP. 


An approximation to A is given by the zero A of T(y, t), v.i-z., 


-{ — 
(2) A “(A —f) = T(4, A). 
A basic result needed to make this precise is the a.u.l. of T(y, t) in 


A ‘(t — #). Often such a result is obtained under some smooth conditions on 
w and under iid. errors. Theorem 4.2a.1 below gives such a result for a 
general nondecreasing right continuous bounded 7 and for fairly general 
independent heteroscedastic errors. 
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Theorem 4.2a.1. Let {(xni, Yni), 1<i<n}, & {Foi, 1<i< n} be 
as in the model (1.1.1) satisfying all conditions of Theorem 2.3.3. In addition, 
assume the following: 


(3) pewW:={y:RtoR, ye DZ(R), bounded with | | ty <k < o}. 
(4) lim supp Keele < o. 
Then, V0 < B<oa, 
(5) sup, , IT(4, B+ Au) — T(, B+ Au)l| = op (1). 
where the supremum is taken over all y€ Y and |lull < B. 
Proof: Rewrite, after integration by parts, 

7m -1 -1 
T(Y, t) =? T(Y, t) = f C A [V(y, t) 7 V(y, f) es P(H(y))A (t =~ A)\dyXy). 
Now (5) readily follows from this and (2.3.37). o 


In order to use this theorem, we must be able to argue that 
|A (A —)|| = 0,(1). To that effect, define 


pi := E Xe), ri = Var Wei), 1<i<n, 
ba := E T(d, Bf) =—-C {AY xi pi 


and observe that 
E||A1(A — f) — ball? = C* Sy x4(X X) xi 7} C* = (1), 


by (3), (4) and the fact that ); xi(X X) x; =p <o. Hence by the Markov 
inequality, Ve >0 4 0< Ke<om 9 
P(||A “(A — B) — ball < Ke) >1—«, foralln> 1. 
Thus, assuming that 
(6) Yi Xi pi = 0, 


and arguing via Brouwer’s fixed point theorem as in Huber (1981,p 169), one 
concludes, in view (5), that V ¢ >0 4 Ne and Ke such that 
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(7) P(A (A—A)l| <Ke)21-¢, Vn>Ne. 


Now, a routine application of (5) enables one to conclude that 


0= T(¥, A)= A “(A-f) -T(¥, A) + op (1), 


1.€., 


(8) A '(A—) = T(¥, A) + 0p(1). 
Note that, under (6), with T)(¥, 6) = C T(4, f), 
E To(¥, B) To(v, 8) = AYixixi 77 A= AX TXA 


where 7 = diag (73, sexs r2). Moreover, for any 2 € RP, 
4 p ¢ t 
A Tol¥, A) = Us B AjdyHes) = Ui A Axi Hei) 


where {dij} are as in (2.3.32). In view of (2.3.33) and (2.3.34), (NX) and 
(6) imply that d\ Toy, f) is asymptotically normally distributed with mean 


0 and the asymptotic variance \ AX 7 XA X. Thus by the Cramer-Wold 
device [Theorem 7.7, p 49, Billingsley (1968)], (4) and (8), 


(9) yV/2a lA _ p) — N(0, Ipxp), 3:= c ax rxac 


We summarize the above discussion as a 

Proposition 4.2a.1. Suppose that the d.f’s {Fni} of the errors and the 
design matriz X of (1.1.1) satisfy (4), (6) and the assumptions of Theorem 
2.3.3 including that H is strictly increasing for each n>1. Then(9) holds. o 


Now, consider the case of the iid. errors in (1.1.1) with Fy; = F. 
Then, 


(10) ia f yr dF —( f yak)’ = 1°, (say), 1<i¢<n, 
C =(ffd¥) Ib», B= (fidy)? 1 ly. 


Consequently (4) is equivalent to requiring {fd~> 0. Next, observe 
that (6) becomes 


(6*) Sixi f ydF =0. 
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Obviously, this is satisfied if either 1; x; = 0, i.e., if X is a centered design 
matrix or if {y~dF = 0, the often assumed condition. The former excludes 
the possibility of the presence of the location parameter in (1.1.1). Thus to 
Summarize, we have 


Proposition 4.2a.2. Suppose that in (1.1.1), Fny =F. In addition, 


); 
assume that X and F satisfy (NX), (F1), (F2), (6*) and that {fdyp> 0. 
Then, 


A*(A-A) = N,(0, 7°/(f £49)" Ipw). o 


Condition (6*) suggests another way of defining M-estimators of # 
in (1.1.1) in the case of the 1.1.d. errors. Let 


(11) Xpj = n 1B xath 1<j¢p; Xni= (Xn1, .--- 5 Xnp); 
re [ x ) Xnl|nxp; X_ — x a X. 

Assume 

(NX1) xx) exists for all n > p, 


max Xni (XeXc) -Xni = 0(1). 
Let 
(12) T (4, t):= Ay B(xni — Xn) WY: — xnit), te RP, 
= (X¢X¢) /?. 


Define an M-estimator A as a solution t of 


(13) T (4, t) = 0. 


Apply Corollary 2.3. 
1 


p times, jth time with dn; = it element of 
the jth column of X,Ay,, 


1 
<i<n, l < j< p, to conclude an analogue of (5) 


above, v.1.z., 
* —* 
(5*) sup [IT"(¥, t) — T°(y, tl = op(1) 
¥,|[A1 (+B) || <B 
where 


T (¥, t) = Ar'(t-A)—(fidv) * T'(9, 8). 
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The proof of (5*) is exactly similar to that of (5) with appropriate 
modifications of replacing X by X, and A by A; and using Fyy=F in 
the discussion there. 


Now, clearly, Fni=F implies that ET (4, f) = 0, 
El|T (¥, A)? = 3s (xi —) VAs (xi x) 7” = O(1). 
Hence, ||T (¥, )|| = Op(1). If A’ is zero of T(,.), then 
Ai\(A —f)=(fitdy) | T'(y, A). 
Argue, as for (7), (8) and (9) to conclude the following 


Proposition 4.2a.3. Suppose that in (1.1.1), Fay =F. In addition, 
assume that X and F satisfy (NX1), (F1) and (F2). Then, 


ear ad 2 2 
AMA" — 8) — Np(0, P/(Jf dy)? Ipw), 
where A’ is as in (13). O 


Remark 4.2a.2. Note that the Proposition 4.2a.3 does not require the 


* 

condition {ydF = 0. An advantage of this is that A can be used as a 
preliminary estimator when constructing adaptive estimators of #. An 
adaptive estimator is one that achieves the Hajek — Le Cam (Hajek 1972, Le 
Cam 1972) lower bound over a large class of error distributions. Often, a 
minimal condition required to construct an adaptive estimator of # is that 
F have finite Fisher information, ie., that F satisfy (3.2.4) of Theorem 
3.2.3. See, e.g., Bickel (1982), Fabian and Hannan (1982) and Koul and 
Susarla (1983). Recall, from Remark 3.2.2, that this implies (F1). 

On the other hand, the condition (NX1) does not allow for any 
location term in the linear regression model. O 


So far we have been dealing with the linear regression model with 
known scale. Now consider the model (2.3.38) where y is an unknown scale 


1 


parameter. Let s beann /2 _ consistent estimator of 7, 1.€., 


(14) jn!/?(s — 74 | = 0,(1). 


Define an M-estimator A, of # asasolution t of 


(15) Bx W((Yi-xitls)=0 or f Wy) V(sdy, t) = 0. 
To keep exposition simple, now we shall not exhibit ~ in some of the 
functions defined below. Define, for an a>0,teR’, 
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(16) S(a, t) :=—A f Wy) V(ady, t), 


S(a,t):= A ‘(t-fy'+c ec, n/?(q— yy — Cc S(¥, B), 
where 
— pq lleay’ ee eo 
C= a "AX fyfly) avy), £ (y) = (f(y), - fal), 
and where C' is as in (1) above. Note that by (NX), (F1), (F3), and (3), 
(17) || Cal] = O(4). 
The following theorem is a direct consequence of Theorem 2.3.4. In it 


N; := {(a, t): a > 0, teR?, | A 1(t-A)|| < By, |n’/(a-7)| < by}, 0<b, B<o. 


Theorem 4.2a.2. Let {(xni, Yni), 1¢<i<n}, By, {Fni, 1<¢i< n} 
be as in (2.3.38) satisfying all the conditions of Theorem 2.3.4. Moreover, 
assume (3) and (4) hold. Then, for every 0<b, B< a fized, 


(18) sup ||S(a, t) — S(a, t)|| = op(1). 

where the supremum is taken over all pe VW, and (a, t’)’€Nj. oO 
Now argue as in the proof of the Proposition 4.2a.1 to conclude 
Proposition 4.2a.3. Suppose that the design matrix X _ and d.f.’s 

{Foi} of {éni} in (2.3.38) satisfy (5), (6) and the assumptions of Theorem 


2.3.4 including that H is strictly increasing for each n> 1. In addition 
assume that there exists an estimate s of y satisfying (14). Then 


(19) A (A\- A)" = C'S(y, 6) —C *en"(s — 9) ¥7*+ of(1), 
where A; now is a solution of (15). o 


Remark 4.2a.3. In (6), Fy is now the df. of ¢;, and not of ei, 
1<¢i¢n. O 


Remark 4.2a.4. Effect of symmetry on Ay. As is clear from (19), in 


general the asymptotic distribution of A, depends on s. However, suppose 
that 


(20) dyy) =—dX{-y), fily) = fi(-y), 1<i¢n, -w<y<+o. 
Then fy fi(y) dy) = 0, 1<i <n, and, from (16), C; = 0. Consequently, 
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in this case, 


A-*(Ai— B) = yC * S(7, B) + 09(1). 


Hence, with ¥ as in (9), we obtain 
-1/2,-1,4 a | 
(21) AA, A) 7* — Np(0, Ip). 


Note that this asymptotic distribution differs from that of (9) only by 


the presence of yt. In other words, in the case of symmetric errors {e;} 
and the skew symmetric score functions {yw}, the asymptotic distribution of 


M-estimator of # of (2.3.38) with a preliminary n'/?_consistent estimator 


ci in scale parameter is the same as that of vy ‘xM-estimator of PB of 
1.1.1). Oo 


4.2b. Bootstrap Approximations 


Before discussing the specific bootstrap approximations we shall describe the 
concept of Efron’s bootstrap a bit more generally in the one sample setup. 

Let €&1, €, ..., n be n iid. G r.v.’s, Gy be their empirical d.f. 
and Ty, = Tn(&n, G) be a function of &)’:= (&1, &, ..., &n) and G such 
that Tn(é;, G) is ar.v. for every G. Let €), Co, ..., Gn denoteiid. Gp 
rv.’s and €)’:=(C1, Co, ..., Gn). The bootstrap df. By of Tn(&n, G) is the 
d.f. of Ta(Q, Gs under Gy. Efron (1979) showed, via numerical studies, 
that in several examples By, provides better approximation to the df. Ty 
of T,(&:, G) under G than the normal approximation. Singh (1981) 
substantiated this observation by proving that in the case of the standardized 
sample mean the bootstrap estimate By, is second order accurate, 1.e., 


(1) sup{|T'n(x) — Ba(x)|; xeR} = o(n 2/*), as. 
Recall that the Edgeworth expansion or the Berry-Esseen bound gives that 
sup{|Pn(x) — &(x)]; xeR} = O(n”), 


where @ is the df. of a N(0, 1) 1r.v. See, e.g., Feller (1966. Ch. XVI). Babu 
and Singh (1983, 1984), among others, pointed out that this phenomenon is 
shared by a large class of statistics. For further reading on bootstrapping we 
refer the reader to Efron (1982). 


We now turn to the problem of bootstrapping M-estimators in a 
linear regression model. For the sake of clarity we shall restrict our attention 
to a simple linear regression model only. Our main purpose is to show how a 
certain weighted empirical sampling distribution naturally helps to overcome 
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some inherent difficulties in defining bootstrap M-estimators. What follows 
is based on the work of Lahiri (1989). No proofs will be given as they involve 
intricate technicalities of the Edgeworth expansion for independent 
non-identically distributed r.v.’s. 


Accordingly, assume that {e;, i > 1} are iid F r.v.’s, {xni, i>1} are 
the known design points, {Ynj, i>1} are observable r.v.’s such that for a (eR, 
(2) Yni = Xnif + Gi, 1> 1. 


The score function w is assumed to satisfy 

(3) f vaF =0. 

Let An be an M-estimator obtained as a solution t of 
(4) Dy Xni WYni — Xnit) = 0 


and F, be an estimator of F based on the residuals @); := Yni— XniAn, 
* 
L<i¢n. Let f{eni, 1<i<n} beiid. Fy, r.v.’s and define 


(5) Yni = xyjAn + Oni 1<i¢n. 
The bootstrap M-estimator A. is defined to be a solution t of 

n * 
(6) Ps Xni UYni — Xnit) = 0. 


Recall, from the previous section, that in general (3) ensures the absence of 

the asymptotic bias in A,. Analogously, to ensure the absence of the 
* 

asymptotic bias in Ay, we need to have Fy, such that 


(7) f ¥dFn = En Yeni) = 0, 


where E, is the expectation under Fy. In general, the choice of Fy, that 
will satisfy (7) and at the same time be a reasonable estimator of F depends 
heavily on the forms of ~ and F. When bootstrapping the least square 
estimator of f, i.e., when yx) = x, Freedman (1981) ensures (7) by choosing 


F, to be the empirical d.f. of the centered residuals {@nj — €n., 1 < i < n}, 


where @). := nD jet én;. In fact, he shows that if one does not center the 
residuals, the bootstrap distribution of the least squares estimator does not 
approximate the corresponding original distribution. 
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Clearly, the ordinary empirical d.f. Hy of the residuals {@n3; 1<i<n} 
does not ensure the validity of (7) for general designs and a general y. We 
are thus forced to look at appropriate modifications of the usual bootstrap. 
Here we describe two modifications. One chooses the resampling distribution 
appropriately and the other modifies the defining equation (6) a la Shorack 
(1982). Both provide the second order correct approximations to the 


distribution of standardized An. 
Weighted Empirical Bootstrap: 
Assume that the design points {xpi} are either all non—negative or all 


n 
non—positive. Let wx := X [Xni| be positive and define 

12 

-7 a 
(8) Fin(y) = wy 2 |xnilI(@ni $y), yeR. 
Take the resampling distribution F, to be Fin. Then, clearly, 


x _4n a : _71Nn a 
by the definition of Ay. That is, Fin satisfies (7) for any yp. 
Modified Scores Bootstrap: 


Let F, be any resampling distribution based on the residuals. Define the 
bootstrap estimator Ans to bea solution t of the equation 


(9) % xni [Yani — xnit) — En H(en1)] = 0. 


In other words the score function is now a priori centered under Fy, and 
hence (7) holds for any Fy and any ¥. 


We now describe the the second order correctness of these procedures. 
To that effect we need some more notation and assumptions. To begin with 


2 28 2g 
let. 7, = J Xni and define 
i: 


Ben ; . = es Bere _ eae 3 
my := max{|xni]; 1<i<n}, bir:= » Xni/Tx, Dx := & |xnil /Tx. 


For ad.f. F and any sampling df. Fy, define 
ox) := Eyer—x), (x) = o°(x) == E{Wer — x) — 7(x)}’, 
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ea,(x) = E{ Wer — x) — o(x)}’, xeR. 
a(x) = EnX(ent—x), n(x) = oF(x) := En{ Went —x) — n(x)’, 
W(X) := En{ Went —x) — n(x)}°, xéR. 


A,(c) := {i: 1¢i<n, |xni| > ctx bx}, An(c) := #An(c), c>0. 
For any real valued function g on R, let g, g denote its first and second 
derivatives at 0 whenever they exist, respectively. Also, write yn, wy etc. 
for Yn(0), wn(0), etc. Finally, let a@:= —¥/o, an := —Yn/on, and, define 
for x € R, H2(x) := x? —1, and 


AG) = ®(x) — byx [{9n/on — Yn @n/o3}(x?/202) + (2 in/603) H2(x)] 


In the following theorems, a.s. means for almost all sequences {e;; i>1} 
ofiid. F r.v.’s. 


Theorem 4.2b.1. Let the model (2) hold. In addition assume that 
has uniformly continuous bounded second derivative and that the following 
hold: 


(a) TZ. (b) a> 0. 

(c) There ezists a constant 0<c <1, such that ln Tx = O(kp(C)). 

(d) my én Tx = 0(Tx). 

(e) There exist constants 9>0,6>0 and q <1 such that 
sup[| Eexp{ity(e, —x)}]: |x| < 6, |t] > A <q. 

(f) V A>O, u” exp(—Aws/ Tx) <o. 


Then, with An defined as a solution of (6) with Fy = Fin, 
x am 
supy|P in(Qn7x(An — An) < y) — A(y)| = o(mx/7x), 


supy|Pin(ar(An — 8) < y) —Pin(tx(An— An) ¢ y)| = 0(mx/7x), 2-5. 


where Pi, denotes the bootstrap probability under Fyn, and where the 
supremum ts overy ER. Oo 


Next we state the analogous result for Ans. 


Theorem 4.2b.2. Suppose that all of the hypotheses of Theorem 4.2b.1 
except (f) hold and that Ans is defined as a solution of (9) with Fy = Hy, the 
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ordinary empirical of the residuals. Then, 
supy| P,( OnTx(Ans — An) < y)— A(y)| = o(ms/7x), 
supy|Pa(arx(An —P)<y)- Pi(Tx(Ans a An) ¢y)| = o(m,/7x), as., 


where Py denotes the bootstrap probability under Hp. o 


The proofs of these theorems appear in Lahiri (1989) where he also 
discusses analogous results for a non—smooth 7. In this case he chooses the 
sampling distribution to be a smooth estimator obtained from the kernel type 
density estimator. Lahiri (1991) gives extensions of the above theorems to 
multiple linear regression models. 

Here we briefly comment about the assumptions (a) — (f). As is seen 
from the previous section, (a) and (b) are minimally required for the 
asmyptotic normality of M-estimators. Assumptions (0), (e) and (f) are 
required to carry out the Edgeworth expansions while (d) is slightly stronger 
than Noether’s condition (NX) applied to (2). In particular, xj=1 and x;= 
i satisfy (a), (c), (d) and (f). 

A sufficient condition for (e) to hold is that F have a positive density 
and w have a continuous positive derivative on an open interval in R. 
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Here we shall now discuss some robust scale estimators. 


Definitions. An estimator P(X, Y) based on the design matrix X 
and the observation vector Y of # is said to be location invariant if 


(1) A(X, Y + Xb) = A(X, Y) + b, V beR?. 
It is said to be scale invariant if 
(2) A(X, aY) = af(X, Y), V ac€R, a# 0. 


A scale estimator s(X, Y) of a scale parameter y is said to be 
location invariant if 


(3) s(X, Y + Xb) = s(X, Y), V beR?’. 
It is said to be scale invariant if 


(4) s(X, aY)) = |a| s(X,Y), V ae€R, af 0. 
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Now observe that M-estimators A and A’ of B of Section 4.2a are 


location invariant but not scale invariant. The estimators A,, defined at 
(4.2a.13), are location and scale invariant whenever s_ satisfies (3) and (4). 


Note that if s does not satisfy (3) then A, need not be location invariant. 
Some of the candidates for s are 


(5) s:= {(n—p) '3i(Yi-xi BY}, 
S;:= med {|Y¥i;—x; Al; 1<i<n}, 
So := med {| Yi— Yj- (xi - x;) Al; 1<i<j<n}, 


where # is a preliminary estimator of # satisfying (1) and (2). 


Estimator -: with f# as the least square estimator, is the usual 
estimator of the error variance, assuming it exists. It is known to be 
non—robust against outliers in the errors. In robustness studies one needs 
scale estimators that are not sensitive to outliers in the errors. Estimator s, 
has been mentioned by Huber (1981, p. 175) as one such candidate. The 
asymptotic properties of 1, S2 will be discussed shortly. Here we just 
mention that each of these estimators estimates a different scale parameter, 
but that is not a point of concern if our goal is only to have location and 
scale invariant M-estimators of £. 

An alternative way of having location and scale invariant 
M-estimators of # is to use simultaneous M-estimation method for 
estimating # and y of (2.3.38) as discussed in Huber (1981). We mention 
here, without giving details, that it is possible to study the asymptotic joint 
distribution of these estimators under heteroscedastic errors by using the 
results of Chapter 2. 

We shall now study the asymptotic distributions of s; and s2 under 


the model (1.1.1). With F; denoting the df. of e;, H=n ‘3; F;, let 

(6) pi(y) := H(y) — H(-y), 

(7) poy) = f [H(y + x) — H(-y + x)] dH(x), y20. 
Define 7; and 72 by the relations 

(8) pi(71) = 1/2, 


(9) Pa(72) = 1/2. 


Note that in the case F; =F, y, is median of the distribution of 
|e;] and ‘y2 is median of the distribution of |e1—e2|. In general, 4j, pj, 
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etc., depend on n, but we suppress this for the sake of convenience. 

The asymptotic distribution of sj is obtained by the usual method of 
connecting the event {s; < a} with certain events based on certain empirical 
processes, as is done when studying the asymptotic distribution of the sample 
median, j= 1, 2. Accordingly, let 


1 


T(y) = BE, MI¥i- Yj —(xi- xi) Bl <9), y20. 


(10) S(y) := BI(|¥i—xiA] <y), 


Then, for an a> 0, 


(11) {s,< a} = {S(a) > (n+1)2 +}, n odd, 
{S(a) > n2*} ¢ {s1< a} C {S(a) > n2 * — 1}, n even. 


Similarly, for an a> 0, 


(12) {so <a} = {T(a) > (N+1)2 *}, N := n(n—1)/2 odd, 
{T(a) > N2°} C {so < a} C {T(a) > N2 -— 1}, N even. 
Thus, to study the asymptotic distributions of sj, j = 1, 2, it suffices to 


study those of S(y) and T(y), y > 0. 
In what follows we shall be using the notation of Chapter 2 with the 


following modifications. As before, we shall write S{, uj etc. for SQ, jg etc. 
of (2.3.1) whenever dni = ao Moreover, in (2.3.1), we shall take 


(13) Xj Vi=x4= Qi, ci = Axi, 1<i<¢n. 
With these modifications, for all n> 1, 


(14) S(y) = Sily, v) - Sy, v) = a? BI(|ei—ci v] <y), 

an *T(y) = f [Si(y+x, v) — Si(y+x, v)] Si(dx, v) -1, y >0, 
with probability 1, where v = A ‘(Bp — f). Let 
(15) wily, u) = wx(H(y), u), Yi(y, u) = Yi(H(y), u), —0<y<o; 


W(y, u) — Yily, u) _ Yi(-y, u), 
K(y, u) = f[¥i(y+x, u) —Yi(-y+x, u)] d(x), y20, we R®. 
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We shall write W(y), K(y) etc. for W(y, 0), K(y, 0) etc. 
Theorem 4.3.1. Assume that (1.1.1) holds with X and {Fy} 


satisfying (NX) and (2.3.3). Moreover, assume that H is strictly increasing 
for each n and that 


(16) lim, , lim supn Fa [H(H’ *(s-+6)+72) — H(H “(s)+72)] = 0. 
0<s<1—é 


About {B} assume that 
(17) JA -(B—A)I| = Op(1). 
Then, V a€R, 
(18) P(n/*(s;- 1) < an) 

= P(W(n) + 0/75; x44 {fi(71) — fi(—)} -¥ 

>—a-yn Si [fi(11) + fi(—2)]) + 0(1), 

(19) P(n'/?(s2— 72) < a) 

= P(2K(72) +n Br Cij f [fi(yotx) — fi(—y2+x)]dFj(x)-v 

> yan Df [fi(ytx) + fi(—r2tx)) dH(x)) + o(1). 

where Cij = (x; —x;) A, 1¢i,j<n. 


Proof. We shall give the proof of (19) only; that of (18) being similar 
and less involved. Fix an a€R and let Q,(a) denote the left hand side of 


(19). Assume that n is large enough so that ay := (an / aes 1)y2 > 0. 

Then, by (12), 

(20) Qn(a) = P(T(an) > (N+1)/2), N odd (N := n(n—1)/2), 
P(T(an) > N/2) < Qn(a) < P(T(an) > N2-*-1), N even. 


It thus suffices to study P(T(an) > N27) 4 b), beR. Now, let 


Ty) = 0 (20 T(y)+1) —2/*pAly), y>0, 
ky := (N + 2b) neg VP nV? (an). 
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Then, direct calculations show that 


(21) P(T(an) > N21 + b) = P(T{(an) > kn). 


We now analyze kn: By (9), 


kn = —n'/?(p2(an) — pa(72)) + O(n). 
But 


n'/?(p9(an) — po 72) 
=n? f [{H(antx) — H(72+x)} — {H(—antx) — H(—12+x)}] dH(x). 


By (2.3.3), the sequence of distributions {p.} is tight on (R,.2), implying 
that y2 = O(1), nV 29 = 0(1). Consequently, in view (2.3.3), 


n!? f {H(4an + x) — H(+72 + x)} dH(x) 


= ay2 n ‘3; f fi(272 + x) dH(x) + o(1), 
and 


(22) ky =—ayon 4 f [fi(y2tx) + f(—y2+x)]dH(x) + 0(1). 


Next, we approximate T,(an) by a sum of independent r.v.’s. The 
proof is similar to the one used in approximating linear rank statistics of 
Section 3.4. From the definition of T; and (14), 


(23) Ty) = 0°”? f [Si(ytx, v) — Si(-y+x, v)] $i(dx, v) — 2"/7paly) 
=n? flyi(ytx, vy) — Vict, v)] Si(ax, ¥) 
+a? flyts, v) - ui(—ytx, v)] ¥4(dx, v) 
+n? f Wily+x, v) — ui(-y+x, v)] wi(dx, v) - n/?p9(y) 


= E,(y) + Ex{y) + E3(y), say. 
But 
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(24) Ea(y):= 0 /? f [ui(ytx, v) — wi(—y+x, v)] wi(dx, v) —n”/*paly) 


=? » , f {F;(y+x+c3;v) — F,(-y+x+¢4;v) — Fi(y+x) 
+ Fi(—y+x)} dFj(x) 
= 25 P c4j vf [fi(y+x) — fi(—y+x)] dF\(x) + 0,(1), 


by (2.3. 3), (NX) and Sig ™ this proof, op(1) means o)(1) uniformly in 
ly| < k, for every 0<k 


Integration by bir (17), (2.3.25), H increasing and the fact that 
fa +/29(dx, v) =1 yield that 
(25) Ex(y):= m/f {ui(ytx, v) — Hi(y+x, v)} Yi(dx, v) 
=a? FEYS+x, v) — Yiytx, W)} Wilds, v) 
= f {¥i(y+x) — Yi(—y+x)} dH(x) + 6,(1). 
Similarly, 


(26) Ey) =n“? f {y9(y+x) — Yi(—y+x)} 89(dx) + Op(1). 


fod observe that n™2/ 4c0 = Hy, the ordinary empirical d.f. of the errors 
e;}. Let 


Eu(y) = f {Yi(y+x)-¥i(-y+x)} d(Ha(x) — H(x)) = Ay) - 2(-y), 


where 

Asy) = fY'ey+x) d[lla(x) - H(x)] y>0 
We shall show that 
(27) A(4an) = Op(1). 


But 
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| A+an) — A+72)| = | f [¥i(H(¢antx)) — Yi(H(#72+x))]d(Ha(x) — H(x))]. 
<2 ey oP 1/2, | Y(H(y)) — Yi(H(z))|. 


(28) mae Op(1), 


because of (2. 3.3) and Corollary 2.3.1 applied with dp; =n ate 


prove (27), it suftices to show that 
(29) A272) = op(1). 
But ; 
IA#7)| = | f° [Yue ret Ha '(t)) — Yue ret H(t) a 


< sup | Y,(H(+72+H {(H H,(t)))) a Y,(H(#72+H (t)))| 
0<t<1 


= op(1), 
by the assumption (16), Lemma 3.4.1 and Corollary 2.3.1 applied with 


daizn//?. This proves (27). Consequently, from (26) and an argument 
like (28), it follows that 


Thus, to 


(30) Ex(an) = f {¥(an+x) — Yi(—ant+x)} dH(x) + op(1) 
= f{¥%(12+x) — Yi(—12+x)} dH(x) + op(1). 
From (23), (24), (25), (30) and the definition (15), we obtain 
(31) Ti(an) = 2K(72) + 2 ee» cijA f {fi 72+) — f;(—7y2+x)} dFj(x)-v 
+ 0,(1). 
Now, from the definition of k, and (22), it follows that the lim ky ie not 
depend on b. Thus the limit of the l-h.s. of (21) is the same for b = 
1/2, and, in view of (21), (22) and (31), it is given by the first term on 1 
r.h.s. of (19). O 


Remark 4.3.1. Observe that, in view of (8) and (9), 
W(n) = 2/78; {1(fei] < 11) — 1/2}, 
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K(y2) = f {H(72+x) — H(—y2+x)} d¥4(x) 


=n 1/9. {H(72 + e:) —H(—72 + e:) — 1/2}. 


Thus, W(7) and K(72) are the sums of bounded independent centered 
r.v.’s and by the L-F CLT one obtains 


(32) oi} Win) — N(0, 1) and a2) K(4) — N(O, 1), 
where 


y) = 
of = Var W(m) = 0 Ss {Fi(11) — Fi(—1) H1 — Fi) + Fi(-10}, 
o% := Var K(72) =n “3; f [H(y2+x) — H(—72+x)]? dF (x) — (1/4). 
Remark 4.3.2. If {Fi} are all symmetric about zero, then from (32), 


(18) and (19), it follows that the asymptotic distribution of s; and s2 does 


not depend on the initial estimator Bp of f. In fact, in this case we can 
deduce that 


(33) rin/(s1-1)71 — N(0, 1), 


73) n'/?(s9 - 92) 72" — N(O, 1), 
where 
2 2 -2 et 
T= Of {27 h(71)} ’ h(x) =n di f;(x), 


r= 0% {12 f h(72+x) dH(x)} "2. D 


Remark 4.3.3. 22d. case. In the case Fy; =F, the asymptotic 
distribution of s; depends on B unless F is symmetric around zero. 


However, the asymptotic distribution of s2 does not depend on B. This is 
so because in this case the coefficient of v in (19) is 


n 3/2 ¥ B(x xj) A f [f(12+x) — f(-72+x)]dF(x) = 0. 


That the asymptotic distribution of s_ is independent of B is not 
surprising because s2 is essentially a symmetrized variant of 5s; We 
summarize this property of s» as 
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Corollary 4.3.1. If in model (1.1.1), FnizF, F satisfies (F1), (F2) 
and X satisfies (NX) then r3 nde (ee 72) = N(0, 1), where 


73 = { f [F(vetx)-F(-12+x)]? dF (x) — 1/4}-{ff72+x) dF(x)}. 0 


Note that 72 is now the median of the distribution of |e; — eo|. 
Also, observe that the condition (16) now is equivalent to 


sup [P(F(e1—y) <¢ s+ 6) — P(F(e1:—y) <s)] +0 as 6-0,VyeR, 
6 


0<s<i-— 


which is implied by the assumptions on F. Oo 


4.4. R-ESTIMATORS OF 8. 


Consider the model (1.1.1) and the vector of linear rank statistics 


(1) T(t) = Ai ¥4 (xni- Xn) p(Rit/(n+1)), t € RP, 
where A, is as in (4.2a.12) and Ri is the rank of Yui — xait among 
{Ynj == xnjt, 1<j< n}. 
One of the classes of R-estimators of f# is defined by the relation 
2 p 
(2) infe | T(t)| 1 = |T(B)| = 3,1 T(t) = 0, 


Tj; being the jth component of T of (1). The estimators A, were initially 


studied by Adichie (1967) for the case p= 1 and by Jureckova (1971) for 
1. 


Pp 2 
Another class of R-estimators can be defined by the relation 


(3) inf || T(t)|] = || T(42)I. 


Yet another class of estimators, introduced by Jaeckel (1972), are 
defined by the relation 


(4) infy J(t) = J(Bs) 
where 
(5) H(t) := 4 (Yui —xnit) o(Rit/(n+1)), t eR. 


Jaeckel (op. cit.) showed that for every observation vector (Y1, ..., Yn) 
and for every n> p, 4; y{i/nt+1) = 0 implies that J(t) is nonnegative, 
continuous and convex function of t. If, in addition, X, has the full rank p 
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then the set {t; J(t)<b} is bounded for every 0 < b < m, where X, is 


defined at (4.2a.11). Consequently, 3 exists. 
Moreover, the almost everywhere derivative of J, w.r.t. t, is 


-A;'T(t). Thus, at Bs, T is nearly equal to zero and hence Bi, Bo, and Bs 
are essentially the same estimators. Jaeckel showed, using the a.u.l. property 


of T(t) due to Juretkova (1971), that indeed ||A;(B1— As)|| = op(1). 


Here we shall discuss the asymptotic distribution of { Bo} under 
general heteroscedastic errors. The main tool is the a.u.l. Theorem 3.2.4. 


We shall also conclude that ||A:(B2 — Bs)|| = 0p(1) under (1.1.1) with 
general independent errors. 


To begin with note that T of (1) is a p—vector (Ty, ..., Tp). where 
T;(t) isa Ta(y, u) — statistic of (3.1.2) with 


(6) Xni = Yni- Xnif, Chi = Ai(xni ms Xn), w= Ar(t eee B), 
dni = &,j) (Xni — Xn), 1 < 1 < Nn, aj) = jth column of Aj, 1 < J < Dp. 
Thus specializing Theorem 3.2.4 to this case readily gives 


Lemma 4.4.1. Suppose that (1.1.1) holds with Fn; asad.f. of eni, 
1<i¢n. In addition, assume that 


(NX,) xx) exists for all n > p, 
max; (Xni—Xn) (Xe Xe) “(xni — Xn) = 0(1). 


About {Fyi} assume that H 1s strictly increasing for each n_ and that 
(2.2.3b), (3.2.12), (3.2.35), (3.2.36) hold and that 


(7) lim, limsupn sup [Lj(s+6) —Lj(s)]}=0, j=1,...,p 


0<s<i— 
where 


L;(s) := Yi (a; j) (xni-Xn))’Fni(H (s)), 0<8<1, 1<j<p. 
Then, for every 0< B <a, 
=] 
(8) sup T(t) — T(B) + KnA1 (t—A)|| = op(1) 
ye 6I|A1 (t-f)||<B 


where 


1n 
Kn := Ay J 3 (xni — En(8))(Xni — Xn) ani(s) dys) At 
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~ -12 
X,(s) :— Nn » Xnifni(s), 
fni(s) as in (3.2.35) and qni(s) := fai(H *(s)), i<i<n, 0<s<¢1. o 


In order to prove the asymptotic normality of Bo, we need to show 
that ||A1‘(B2 — B)|| = Op(1). To this effect let 


fe:= A, Dy (Xni — Xn) f Fai(H’) dy, S:= T(f)-4z. 


Observe that the distribution of (Br — f) does not depend on f, even when 
{eni} are not identically distributed. 


Lemma 4.4.2. In addition to the assumptions of Lemma 4.4.1 suppose 


that 

(9) IS + ll = Op(1), 

(10) lim inf, tale 10 Kn >a foran a>Q0, 
(11) K;,’ exists for all n> p, 1K," | = O(1). 


Then, for every «€>0, 0<z2<o, there exrista 0<b<am and Ne such 
that 
(12) P( is |T(Au+ Pl >z)>l-e«, n>Ne. 

ul|>b 


Proof. Fixan ¢€>0, 0<2z<o. Without loss of generality assume 
B=0. Observe that by the C—S inequality 


inf ||T(A; u)|| > inf (0 T(rAy 6))?. 
[al] >b l[ol|=1, |r| >b 


Thus it suffices to prove that there exist a 0<b<wm and Ne such that 


(13) P( inf (0 T(rA, #))*>z)>1—-e, n>Ne. 
|| ol|=1, | r|>b 


Let, for t € R?, T(t) := T(0) —Ky Aj't, so that, by (8) for every 0< B <a, 


(14) up |6 T(rA; 6)— 0 T(rA, 8)| = 0,(1). 


Ss 
|| l|=1, |r| <B 
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But . . 
6 T(rA,6) = 0 (S+ pg) -— 6 K, @r. 
By (9), there exist a Ke and an Nye such that 
P(|S + pl <Ke)>1—e/2, n> Nie. 
Choose b to satisfy 


(15) b>(Ket+z/*\a?, a asin (10). 
Then 
(16) P( inf (0 T(rA, 6))*>2z) 

l=, |r| >b 


> P(||S + wl] <—2'/? +b ve |0 Ky 6| ) 
Oi) =1 


> PIS + wll <Ke) 21/2, Vn> Nee. 
Therefore by (14) and (16) there exist Ne and b as in (15) such that 


(17) P( (0 T(rA,6))?>z)>1—e, n>Ne. 


inf 
[O|=1, |x] >b 
But 
9 T(rAs0) = 0 AL3s (xi—X) o(Rix/(n+1)) = Bi di (Ris /(n+1)), 
where dj = 0 Ay (x; — x), Rir is the rank of Y;— (xi — x) Ay 6. But 
such a linear rank statistic is nondecreasing in r, for every @ See, e.g., 
Hajek (1969; Theorem 7E, Chapter II). This together with (17) enables one 
to conclude (13) and hence (12). o 
Theorem 4.4.1. Suppose that (1.1.1) holds and that the design matriz 


X and the error d.f’s {Fni} satisfy the assumptions of Lemmas 4.4.1 and 
4.4.2 above. Then 


(18) Ai} (Bo — B) — Kn'p = Kz'S + 0, (1). 
Proof. Follows from Lemmas 4.4.1 and 4.4.2. Oo 
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Remark 4.4.1. Arguing as in Jaeckal combined with an argument of 
Lemma 4.4.2, one can show that Ai (Bo — B3)|| = op(1). Consequently, 
under the conditions of Lemmas 4.4.1 and 4.4.2, f. and the Jaeckel 
estimator As also satisfy (18). o 

Remark 4.4.2. Consider the case when Fy; =F, F a df. satisfying 


(F1), (F2). Then ~ =O and S = T(f). Moreover, under (NX-,) all other 
assumptions of Lemmas 4.4.1 and 4.4.2 are a priori satisfied. Note that here 


fni= 1, Xn(S)=Xn and Ky = ff dy(F)-Ipw. 


Moreover, from Theorem 3.4.3 above, it follows that S = N,(0, ao Ipwp); 


2 fia ; 2 , tee 
T= f y"(u)du — ( i, y(u)du) . We summarize the above discussion in 


Corollary 4.4.1. Suppose that (1.1.1) with Fyi =F holds. Suppose 
that F and X satisfy (F1), it and ag In addition, suppose that 


is nondecreasing bounded on (0, 1} and Jfdy(F)>0. Then 
(19) Ai’ (Bo — B) ={f £dy(F)} “T() + op(1). 
Moreover, 
Ai)(B2 — B) —= N(O, Ipp), T= o (f£dy(F))*. o 


This result is quite general as far as the conditions on the design 
matrix X and F are concerned but not that general as far as the score 
function y is concerned. oO 


Remark 4.4.2. Robustness agatnst heteroscedastic gross errors. First , 
we give a working definition of qualitative robustness. Consider the model 
(1.1.1). Suppose that we have modeled the errors {en;, 1 < i < n} to be iid. 
F whereas their actual d-f.’s are {Fyi, 1<i<n}. Let Pa =f iB, Qa := Ht Fai 

1= 12 
denote the corresponding product probability measures. 


Definition 4.4.1. A sequence of estimators A is said to be 
qualitatively robust for # at F against Q. if it is consistent for B under 
Po and under those Q® that satisfy Dy := max; supy|Fni(y) — F(y)| — 0. 


The above definition is a variant of that of Hampel (1971). One could 
use the notions of weak convergence on product probability spaces to give a 
bit more general definition. For example we could insist that the Prohorov 
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distance between Q" and P" should tend to zero instead of requiring 
Dy, — 0. We do not pursue this any further here. 


The result (18) can be used to study the qualitative robustness of fy 
against certain heteroscedastic errors. Consider, for example, the gross errors 
model where, for some 0 < 6p; < 1, with max; 6,;— 0, 


and, where G is d.f. having a uniformly continuous a.e. positive density. If, 
in addition, {6,3} satisfy 


(20) [Ar i (xni — Xn) Snill = O(1), 


then one can readily see that Kz" = O(1) and |[yl| = O(1). It follows 


from (18) that f is qualitatively robust against the above heteroscedastic 
gross errors at every F that has uniformly continuous a.e. positive density. 
Examples of 6p; satisfying (20) would be 


bagen/? or byy=p /? Ar (xni—xa)l|, 1 <i <2. 


It may be argued that the latter choice of contaminating proportions {6,3} 
is more natural to linear regression than the former. 


A similar remark is applicable to A and 3. o 


4.5. ESTIMATION OF Q(f). 


Consider the model (1.1.1) with F,; = F, where F is ad.f. with density f 
on R. Define 


(1) Q(f) = f£dy(F) 
where vy € @ of (3.2.1). 


As is seen from Corollary 4.4.1, the parameter Q appears in the 
asymptotic variance of R-estimators. The complete rank analysis of the 
model (1.1.1) requires an estimate of Q. This estimate is used to 
standardize rank test statistics when carrying out the ANOVA of linear 
models using Jaeckal’s dispersion J of (4.4.5). See, for example, 
Hettmansperger (1984) and references therein for the rank based ANOVA. 

Lehmann (1963) and Sen (1966) give estimators of Q in the one and 
two sample location models. These estimators are given in terms of lengths 
of confidence intervals based on linear rank statistics. Koul (1971) extended 
these estimators to the multiple linear regression model (1.1.1). In this case 
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these estimators are given in terms of Lebesgue measures of certain 
confidence regions based on ranks and are hard to compute for p > 1. 

Cheng and Serfling (1981) discuss several estimators of Q when 
observations are i.i.d. F, i.e., when there are no nuisance parameters. Some 
of these estimators are obtained by replacing f by a kernel type density 
estimator and F by an empirical d.f. in Q. Scheweder (1975) discusses 
similar estimates of Q in the one sample location model. 

In this section we discuss two types of estimators of Q. Both use a 
kernel type density estimator of f based on the residuals and the ordinary 
residual empirical d.f. to estimate F. The difference is in the way the 
window width and the kernel are chosen. In one the window width is 
partially based on the data and is of the order of square root of n and the 
kernel is the histogram type whereas in the other the kernel and the window 
width are arbitrary. It will be observed that the a.u.l. result about the 
residual empirical process of Corollary 2.3.5 is the basic tool needed to prove 
the consistency of these estimators. 

We begin with the class of estimators where the window width is 
partly based on the data. Define 


(2) p(y) = f [F(y+x) — F(-y+x)] dy(F(x)), y20. 


Since y isad.f., p(y) = a —e*| < y) where e, e* are independent r.v.’s 
with respective d.f..s F and y(F). Consequently, under (F1), the density of 
p at 0 is 2Q. This suggests that an estimate of Q can be obtained by 
estimating the slope of p at 0. 

Recall the definition of the residual empirical process H,(y, t) from 


(1.2.1). Let B be an estimator of fh and define 
(3) Hn(y) := Hn(y, A); yeR. 


. A natural estimator of p is obtained by substituting H, for F in D, 
vV.1.Z., 


(4) Bn (y) == f [Hn(y+x) — Ha(—y+x)] dy(Ha(x)), y>0. 


Let — m = €:9)< €(4) £ @cg) & « £ Ecny < Ecnsi1y)= o denote the ordered 


residuals {@;, 1<i<n}, where é; = Yj — xf, 1<i<n. Since (Hn) 
assigns mass {(j/n) — y((j-1)/n))} to each @(j;) and zero mass to each of 
the intervals (€(j-,), 6¢j)), 1< j<n+1; it readily follows that Vy eR, 
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Ba(y) = 2 {o(i/n) — o((-A)/a)} [Bia(y+8 5) — Ha (948 iy) 


= ae ee P 
(5) =a" ¥ {o6i/n) — A(AY/a)} BIB 3, — 8a) | $y). 

From (5) one sees that pn(y) has the following interpretation. For 
each j, one first computes the proportion of {€,;,} falling in the interval 
[-y+€,j), yté:j)] and then pp(y) gives the weighted average of such 
proportions. Formula (5) is clearly suitable for computations. 


Now, if {hn} is a sequence of positive numbers tending to zero, an 
estimator of Q is given by 


Qn = Pn(hn)/ 2hn. 


This estimator can be viewed from the density estimation point of view also. 
Consider a kernel—type density estimator fy of f based on the residuals {6;}: 


{2 P 
fn(x) := (2nhn) * 2 (|x — @;] ¢ bn), 


which uses the window wy,(x) = (1/2)-I(|x| < bn). Then a natural 
estimator of Q is 


J fu dpa) = % {o(2) — oES)} fa(@ciy) = Oo. 


Scheweder Son studied the asymptotic properties of this estimator 
in the one sample location model. Observe that in this case the estimator of 
Q does not depend on the estimator of the location parameter which makes 
it relatively easier to derive its asymptotic properties. 

In Q,, there is an arbitrariness due to the choice of the window width 
hn. Here we recommend that h, be determined from the spread of the data 


as follows. Let 0< a<1, tn be ath quantile of py and define the 
estimator Qn of Q as 


(6) Qe = pa(n 1/42 )/(an 1/42 J. 


The quantile tf is an estimator of the ath quantile t* of p. Note 
that if y(s)=s, then t” is the ath quantile of the distribution of |e,-e9| 
and ty is the ath quantile of the empirical df. pn of the r.v.’s {]@;-é;|, 1 
<i, j <n}. Thus, eg., tn? = s2 of (4.3.5). Similarly, if y(s) = I(s > 0) 
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then t” (tn) is ath quantile of the d.f. of |e;| (empirical df. of |@;|, 1 <i 

<n). Again, here t,° would correspond to s; of (4.3.5). In any case, in 

general, t” is a scale parameter in the sense of Bickel and Lehmann (1975). 
The consistency of Qn is asserted in the following 


Theorem 4.5.1. Let (1.1.1) hold with Fni =F. In addition to (NX), 


(F1) and (F2), assume that B is an estimator of B satisfying (4.3.17). 
Then, 


(7) sup |Qn — Q(f)| = op(1). 
ve 6 


The proof of (7) will be a consequence of the following three lemmas. 


Lemma 4.5.1. Under the assumptions of Theorem 4.5.1, V 0<a<oa, 


(8) sup |n/?48,(n 1/22) — p(n /22)}| = op (1). 
ye 6,0<zK<a 


Consequently, V0< a<ao, 


(9) lttin jn 75q(n 1/72) — 22Q(£)| = op(1). 
? SZS 


Proof. We shall apply Corollary 2.3.5. Let v = Al (Bp — fi), 
b, =n 1/2 x5; A. Then, from (2.3.46), (3) and (4.3.17), we obtain 
1 


(10) sup jn/?{Fa(y) — Ha(y)} — bav f(y)| = op(1). 
— @sy 
where 
Ha(y) = Ha(y, 6) = "2 (ens y), yeR 


Also, we will use the notation of (2.3.1) with 


Then Y;(t, 0) = n!/"[H,(F 4(t)) —t], 0<¢ <1. Write Y\(-) for Y;(-, 0). 
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Now, (10) and y bounded imply that, 
n!*{5n(y) — p(y)} = 2/7 f {Hn(y+x) — Ha(—y+x)} dy(Ha(x)) 
+bav f[f(y+x) —f(-y+x)] dy(Ha(x)) 
-n/ p(y) + 5p(1) 
(12) = Raily) + Rnaly) + Rna(y) + op(1), 


where 0,(1) stands for a sequence of random processes that converge to 
zero, uniformly in —w< y<¢o, ye & in probability, and where 


Rai(y) = f {¥(F(y+x)) — Yi(F(-y+x))} dy(Hn(x)) 
Rnay) = ba v f [f(y+x) — f(-y+x)] dy(Hn(x)) 
Rasy) = 0/7{ f [F(y+x) — F(-y+x)] dy(Hn(x)) 
— f[F(y+x) — F(-y+x)] dy(F(x))}, yer. 
From (F1), (F2), the boundedness of yy, and the asymptotic 


continuity of Y;, which follows from oath 2.2a.1, applied to the 
] 


quantities given in (11), we obtain, with k = 2a |[fl] _ 


(13) sup Ran /72)| << sup _, | ¥i(t) — ¥i(s)| = op(1). 
0<z<a, ve 6 |t—s|<kn7 2 
Again, (F1) and the boundedness of y imply, in a routine fashion, 
that 
(14) sup |Rni(n //2z)| = 09(1). 


0<z<a, pe 


Now consider Rn3. By the MVT, (F1) and the boundedness of y, the 
first term of Ras(a / 7) can be written as 


22 ff(Exon) dy(Hn(x)) = 22 f f(x) dy(fin(x)) + 5p(1) 


-1/2 


where {&2n} are real numbers such that |és2n —x| < an “‘". Do the same 
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with the second integral and put the two together to obtain 
Rna(n /z) = 22{ f£dy(Hn) — ff dy(F) } + 5,(1) 
1 
a -1 = 
= 22{ f [a(FHn'(t)) —a(t)] dy(t) } + 6p(1). 


But, 
(15) sup |FHn (t)—t| <n + supy |Ha(y) — F(y)| = 09(1) 
0<t<1 
by (10) and the Glivenko-Cantelli Lemma. Hence, q_ being uniformly 


continuous, we obtain 


sup |Rna(n /2z)| = 0)(1). 
0<z<a, ve 6 


This together with (12) — (15) completes the proof of (8) whereas that of (9) 
follows from (8) and the fact that the uniform continuity of f implies that 


sup [n/2p(n 1/22) — 22 Q(f)| 0. 0 
0<z<a, pe © 


Lemma 4.5.2. Under the assumptions of Theorem 4.5.1, Vy > 0, 
sup |Pn(y) — p(y)| = op(1). 
ye 6 

Proof. Proceed as in the proof of the previous lemma to rewrite 


Pn(y) — p(y) = Mnily) + Pnoly) + na(y) + Op(1) 


where I‘pj = n / “Raj. j= 1, 2,3, with Ryj defined at (12). 
By Corollary 2.2a.2 applied to the quantities given at (10), || Yi, = 
O,(1) and hence f, y bounded trivially imply that 


sup |T'nj(y)| = op(1), j=1,2. 
ye 6, y>0 


Now, rewrite 


Pna(y) = [f F(y+x) don (x)) — fF(y+x) dy(F(x))] 
- | F(-y+x) dy(Bn (x)) — JF(-y+x) dy(F(x))] 
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= T p(y) + Pa(-y), Say. 
But, V yeR, 


1 as | - 
Ta(y) =f. {Fy + F (FHn (t))) — F(y + F “(t))} dy(t) = op (1), 
because of (15) and because, by ‘ei and (F2),V y>0,F(iy+F. (t)) : 
uniformly continuous function of t € 0, 1]. 


Lemma 4.5.3. Under the conditions of Theorem 4.5.1, V € > 0, 
P(jtn—t°’| < t°, Ve). 


Proof: Observe that the event [pn((1-e)t") < a@ < pa((1+e)t”)] 
implies the event [(1—e)t™ < tn < (1+e)t"]. Hence, by two applications of 


Lemma 4.5.2, once with y = (1+e)t®, and once with y = (1—e)t”, we obtain 
that 


lim inf, P(|tn —t°| < et", Vpe@) 
> P(p((1-€)t") < a¢ p((1+)t%), V ye 8) = 1. n 
Proof of Theorem 4.5.1. Clearly, V pe & 


108 — Q(f)| = (208) *|n/? b(n”? t2) — 22 Q(A)]. 
By Lemma 4.5.3, V e > 0, 
P(O <t2< (14+ 6t®, Vye?)—1. 


Hence (7) follows from (9) applied with a = (1+e)t°, Lemma 5.4.3 and 
Slutsky’s Theorem. Oo 


Remark 4.5.1. The estimator Qn shifts the burden of choosing the 
window width to the choice of a. There does not seem to be an easy way to 
recommend a universal a. In an empirical study done in Koul, Sievers and 
McKean (1987) that investigated level and power of some rank tests in the 
linear regression setting, a = 0.8 was found to be most desirable. o 


Remark 4.5.2. It is an interesting theoretical exercise to see if, for 


some 0 < 6 < 1, the processes {n y #(00 = Q(f)); §< a<1-— 6} converge 
weakly to a Gaussian process. In the case y(t) =t, Thewarapperuma (1987) 
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has proved, under (F1), (F2), (NX), and (4.3.17), that V fixed 0<a< 1, 
n'/2(92 _ Q(f)) — N(0, 0°), where o” = 16 {ff°(x)dx —(ff(x)dx)}. 0 


Remark 4.5.3. As mentioned earlier, {tn, 9 €@} provides a class of 


scale estimators for the class of scale parameters {t", ye ¢ }. Recall that 
Ss; and sy» of (4.3.5) are special cases of these estimators. The former is 
obtained by taking y(u) = I(u > 0) and the latter by taking y(u) =u. For 
general interest we state a theorem below, giving asymptotic normality of 
these estimators. The details of proof are similar to those of Theorem 4.3.1. 
To state this theorem we need to introduce appropriately modified analogues 
of the entities defined at (4.3.15): 


K(y) = f[¥i(yt+x) — Yi(ytx)] dy(F(x)), 
Koy) = f ¥9(x) [f(y+x) —f(-y+x)] {f(x)}* dy(F(x)), 
K(y) := Ki(y) — Ka(y), y 20, 


where Y} is as (4.3.15) adapted to thei.id. errors setup. It is easy to check 
that K(t") isn Bar {a sum of iid. r.v.’s} with EK(t")=0 and 
0< (0%)? := Var(K(t")) < o, not depending on n. 

Theorem 4.5.2. In addition to the conditions of Theorem 4.5.1, 


assume that either y(t) = I(t >u), O< u<1, fized or op is uniformly 
differentiable on [0,1]. Then, V 0<a<1l, 


n/*(te—t°) — N(0, (X*)’), 
where 


(v°)?:= (0%) {tf [£(t-+x) + f(°+x)] dy(F(x))} *. : 


We now turn to the arbitrary window width and kernel—type 
estimators of Q. Accordingly, let K bea probability density on R, hn bea 


sequence of positive numbers and f and {é@;} be as before. Define 
a _7, n _ 
f,(x) := (nhy) “2 K((x- @;)/ba), 


fn(x) := (nha) 3 K((x — e:)/hn), x € R, 


On = f fa(x) dy(Hn(x)). 
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Theorem 4.5.3. Assume that the model (1.1.1) with Fyi=F holds. In 
addition, assume that (F1), (F2), (NX) and (4.3.17) and the following hold: 


(i) ba>0, hh—0, o/h, Go. 
(ii) K is absolutely continuous with its a.e. derivative K satisfying 


fIKl <o. 


Then, 
(16) sup |Qn — Q(f)| = op(1). 
YE 6 
Proof. First we show fp approximates f. This is done in several 
steps. To begin with, summation by parts shows that 
fn(x) —fa(x) = ha’ f [Hn (x — bn z) — Hn(x — hy z)] K(z) dz 
so that 
fn — fall, ¢ (0'/7hp)*-|[n/?(Htn — Ha)ll + f IKI. 


Hence, by (10) and the fact that |bav| = O/,(1) guaranteed by (4.3.17), it 
readily follows that 


(17) fn — fall, = Op((n’/hy)*) = op(1). 
Now, let 
fn(x) = ha’ f'K((x—y)/bn) f(y) dy. 


Note that integration by parts shows that 


fn(x) =—hn’ f K(z) F(x — hn z) dz 
so that 
(18) fn — fall, ¢ (n/7bn) = IIn’/7[Hn — FIll_- ff [K| = op(1), 


by (i) and by the fact that \jn1/ "(Hn — F)|_ = O,(1). Moreover, 


(19) Jin—fl] < sup |f(y)—f(x)| =0(1), by (FI). 
|y—x|<hn 
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Now, consider the difference 
Qn- Q(f) = f (fn -f) dy(Hn) + ff d[e(Hn) — (F)] 
= Dai + Dna, say. 
Let q(t) =f(F ‘(t)). Then 


sup |[Dn2| < sup |a(F(Hn (t))) —a(t)| = op(1) 
ye 6 0<t<1 
by the uniform continuity of q and (15). Also, 
sup [Dail < [fn—fll, = op(1) 
ye C 
by (17) — (19), thereby proving (16). 


4.5 


00 


CHAPTER 5 
MINIMUM DISTANCE ESTIMATORS 


5.1. INTRODUCTION 


The practice of obtaining estimators of parameters by minimizing a certain 
distance between some fonctions of observations and parameters has long 
been present in statistics. The classical examples of this method are the 
Least Square and the minimum Chi Square estimators. 

The minimum distance estimation (m.d.e.) method, where one obtains 
an estimator of a parameter by minimizing some distance between the 
empirical d.f. and the modeled d.f., was elevated to a general method of 
estimation by Wolfowitz (1953, 1954, 1957). In these papers he 
demonstrated that compared to the maximum likelihood estimation method, 
the m.d.e. method yielded consistent estimators rather cheaply in several 
problems of varied levels of difficulty. 

This methodology saw increasing research activity from the 
mid—seventy’s when many authors demonstrated various robustness 
properties of certain m.d. estimators. Beran (1977) showed that in the i.id. 
setup the minimum Hellinger distance estimators, obtained by minimizing 
the Hellinger distance between the modeled parametric density and an 
empirical density estimate, are asymptotically efficient at the true model and 
robust against small departures from the model, where the smallness is being 
measured in terms of the Hellinger metric. Beran (1978) demonstrated the 
powerfulness of minimum Hellinger distance estimators in the one sample 
location model by showing that the estimators obtained by minimizing the 
Hellinger distance between an estimator of the density of the residual and an 
estimator of the density of the negative residual are qualitatively robust and 
adaptive for all those symmetric error distributions that have finite Fisher 
information. 

Parr and Schucany (1979) empirically demonstrated that in certain 
location models several minimum distance estimators (where several comes 
from the type of distances chosen) are robust. Millar (1981, 1982, 1984) 
proved local asymptotic minimaxity of a fairly large class of m.d. estimators, 
using Cramer-Von Mises type distance, in thei.i.d. setup. Donoho and Liu 
(1988 a, b) demonstrated certain further finite sample robustness properties 
of a large class of m.d. estimators and certain additional advantages of using 
Cramer-Von Mises and Hellinger distances. All of these authors restrict 
their attention to the one sample setup or to the two sample location model. 
See Parr (1981) for additional bibliography on m.d.e. through 1980. 

Little was known till the early 1980’s about how to extend the above 
methodology to one of the most applied models, v.i.z., the multiple linear 
regression model (1.1.1). Given the above optimality properties in the one- 
and two- sample location models, it became even more desirable to extend 
this methodology to this model. Only after realizing that one should use the 
weighted, rather than the ordinary, empiricals of the residuals to define m.d. 
estimators was it possible to extend this methodology satisfactorily to the 
model (1.1.1). 
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The main focus of this chapter is the m.d. estimators of # obtained 
by minimizing the Cramer-Von Mises type distances involving various 
w.e.p.’s. Some m.d. estimators involving the supremum distance are also 
discussed. Most of the estimators provide appropriate extensions of their 
counterparts in the one- and two- sample location models. 

Section 5.2 contains definitions of several m.d. estimators. Their 
finite sample properties and asymptotic distributions are discussed in 
sections 5.3, 5.5, respectively. Section 5.4 discusses an asymptotic theory 
about general minimum dispersion estimators that is of broad and 
independent interest. It is a self contained section. Asymptotic relative 
efficiency and qualitative robustness of some of the m.d. estimators of 
Section 5.2 are discussed in Section 5.6. Some of the proposed m.d. 
functionals are Hellinger differentiable in the sense of Beran bet as is 
shown in Section 5.6. Consequently they are locally asymptotically minimax 
(1.a.m.) in the sense of Hajek — Le Cam. 


5.2. DEFINITIONS OF M.D. ESTIMATORS 


To motivate the following definitions of m.d. estimators of f of (1.1.1), first 


consider the one sample location model where Y;, — @, ..... Yn — 9 are i.i.d. 
F, F aknownd.f.. Let 

_47n 
(1) Fr(y) =n} I(¥i¢y), yeR. 


If @ is true then EF,(y + 6) = F(y), V yeR. This motivates one to define 
m.d. estimator 6 of 6 by the relation 


(2) @ = argmin{T(t); teR} 
where, fora G € DZ(R), 
(3) T(t) = nf [Fa(y + t) —F(y)}° dG(y), te, 


Observe that (2) and (3) actually define a class of estimators 6, one 
corresponding to each G 


Now suppose that in (1.1.1) we model the d.f. of eni to be a known 
d.f. Hyi, which may be different from the actual df. Foi, 1<¢i1¢n. How 


should one define a m.d. estimator of #? Any definition should reduce to 0 
when (1.1.1) is reduced to the one sample location model. One possible 
extension is to define 

(4) B, = argmin{K,(t); t eR}, 


where 
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(5) K,(t) =m f[¥ {1(Yui $y + xnit) —Hni(y)}]? dG(y), te ?. 


If in (1.1.1) we take p = 1, xpi; = 1 and Hyj =F then clearly it reduces to 
the one sample location model and f; coincides with 8 of (2). But this is 
also true for the estimator B, defined as follows. Recall the definition of 


{V;} from (1.2.1). Define, foryeR,teER, 1<j<p, 


(6) 2;(y, t) = Vj(v, t)— 3. xnij Hni(y). 
Let 
(7) K(t):= fF (y, t(X X) “Hy, t) dG(y), te R?, 


where # := (Jj, ...., Zp) and define, 
(8) B, = argmin{K,(t), teR?}. 


Which of the two estimators is the right extension of 0? Since {Vj, 
1 < j< p} summarize the data in (1.1.1) with probability one under the 


continuity assumption of {eni, 1 <i < n}, B, should be considered the right 
extension of 6. In Section 5.6 we shall see that B, is asymptotically 
efficient among a class of estimators {A} defined as follows. 


Let D = ((dnij)), 1<i¢n, 1<j<p, bean nxp real matrix, 


(9) Vjay, t) = 3 dni (Yai $y + xnit), yeR, 1<j¢p, 
and 
p n ”) p 
(10) Kilt) = 2, f[Vialy, t) - 2% dnij Hni(y)]" dG(y), te R’. 
Define 
ee asa ; 
(11) A, = argmin{K)(t), teR’}. 


If D =n /[1,0, ..., Olnxp then A, = A; and if D = XA then B, = B,, 
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where A is as in (2.3.32). The above mentioned optimality of B, is stated 


and proved in Theorem 5.6a.1. 
Another way to define m.d. estimators in the case the modeled error 
d.f.’s are known is as follows. Let 


= ns , 
(12) M (s,y,t) =n 2 2 {I(Yni < y) a Hni(y — Xnit)}, SE [0, 1), yeR, 


(13) Qt) = ff {M(sy,t)P ac(y) a6), ter, 
where L isad-f.on [0,1]. Define 
(14) B = argmin{Q(t), teR?}. 


The estimator # with L(s) =s is essentially Millar’s (1982) proposal. 

Now suppose {Hyi} are unknown. How should one define m.d. 
estimators of # in this case? Again, let us examine the one sample location 
model. In this case @ can not be identified unless the errors are symmetric 
about 0. Suppose that is the case. Then ther.v.’s {Y;— 0, 1<i<n} have 


the same distribution as {-Y; + 0, 1<i<n}. A md. estimator & of 0 
is thus defined by the relation 


(15) §° = argmin {T"(t), te€R} 
where 
(16) T(t)=n P(E {vis y +t) Vi < y-t)}P aG(y). 


An extension of @” to the model (1.1.1) is Aj defined by the relation 


(17) fy = argmin{K}(t), t € RP} 

where, for t € RP, 

(18)  Kj(t):= fV"(y, t)(X X)'V‘y, t) dG(y), V7 = (Vi, -... Vp); 
Vily, t) = 3 xnig{l(Yni $ yomnit) —1(-Yui < y-xnit)}, yeR, 15i¢p. 


More generally, a class of m.d. estimators of # can be defined as 
follows. Let D beas before. Define, for yeR, 1<¢j<p,, 


(19a) Yi(y, t) = by dnij {I(Yni $y + xnit) —I(-Yni < y — xnit)}. 
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Let Y= (Yi, ..., Yp) and define 


(19b) K*(t) = [Y} (y, t) ¥e(y, t) dG(y), te R?. 
and 6, by the relation 
(20) 6, = argmin {K)(t), teR?}. 


Note that B is 6, with D = XA. 


Next, suppose that the errors in (1.1.1) are modeled to be i.i.d., i.e., 
Hni=zF and F is unknown and not necessarily symmetric. Here, of course, 
the location parameter can not be estimated. However, the regression 
parameter vector ff can be estimated provided the rank of X, is p, where 
X, is defined at a3) In this case a class of m.d. estimators of ff is 


a 


defined by f, of (11) provided we assume that 


n 
(21) X dnij = 0, 1<j<p. 
1: 
A member of this class that is of interest is B, with D = X,A,, A; as in 
(4.3.11). 


Another way to define m.d. estimators here is via the ranks. With 
Rit as in (3.1.1), let 


n 
(22) Tja(s, t) = % dnaj I(Rit ¢ ns), s€[0, 1],1<j<p, 
1= 


K(t) = [T,(s, t) T,(s, *) dL(s), te RP, 


where T, = (T;,..., Tp) and L isadf. on [0, 1]. Assume that D 
satisfies (21). Define 


(23) fi = argmin{K,(t), teR"}. 


Observe that {A,}; {6} and {f} are not scale invariant in the 


sense of (4.3.2). One way to make them so is to modify their definitions as 
follows. Define 


(24) Ky(a,t) = 3, fivjalay, t)— 3 dass Baily)? 4609), 


K*(a, t) = [Y% (ay, t)¥p(ay, t) dG(y), teR?, ar0. 
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Now, scale invariant analogues of B, and 6, are defined as 


(25) B := argmin {K,(s, t), teR?}, 5° := argmin {K}(s, t), teR?}, 


where s is a Scale estimator satisfying (4.3.3) and (4.3.4). One can modify 
{f} in a similar fashion to make it scale invariant. The class of estimators 
{6} is scale invariant because the ranks are. 


Now we define a m.d. estimator based on the supremum distance in 
the case the errors are correctly modeled to be i.i.d. F, F an arbitrary d.f. 
Here we shall restrict ourselves only to the case of p= 1. Define 


nh 
(26) V-(y, t) = & (xi — x) I(Yi< y + tx), t,y ER, 


Da(t) := sup {Vc(y, t); yeR}, 
Dz(t) := —inf {Vc(y, t); yeR}, 
D,(t) := max {D5(t), Da(t)} = sup{|Vc(y, t)]; yeR}, t eR. 
Finally, define the m.d. estimator 
(27) fs := argmin{D,(t); teR}. 
Section 5.3 discusses some computational aspects including the 
existence and some finite sample properties of the above estimators. Section 
rn 
5.5 proves the uniform asymptotic quadraticity of K), Ky, K) and Q as 


processes in t. These results are used in Section 5.6 to study the asymptotic 
distributions and robustness of the above defined estimators. 


5.3 FINITE SAMPLE PROPERTIES AND EXISTENCE 


The purpose here is to discuss some computational aspects, the existence and 
the finite sample properties of the four classes of estimators introduced in the 
previous section. To facilitate this the dependence of these estimators and 
their defining statistics on the weight matrix D will not be exhibited in this 
section. 

We first turn to some computational aspects of these estimators. To 
begin with, suppose that p=1 and G(y) =y in (5.2.10) and (5.2.11). 


Write B. x3, dy for B. Xi1, di1, respectively, 1<¢1<n. Then 
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(1) K(t) = f[ Bs di{I(¥i < y + xit) — Hi(y)}]° dy 
= 34 3) didj f {1(Vi < ytxit) — Hi(y) HI(Yj < y-+xjt) — Hy(y)}ay. 


No further simplification of this occurs except for some special cases. 
One of them is the case of the one sample location model where x;=1 and 
H; = F, in which case 


K(t) = f [Bs dif1(¥i ¢ y) — Fly — t)}}° dy. 


Differentiating under the integral sign w.r.t. t (which can be justified under 
the sole assumption: F has a density f w.r.t 2) one obtains 


K(t) = 2 f (3; di{I(¥i < y + t) — F(y)} dF(y) 
= —2 0; di{F(Y; —t) — 1/2}. 


Upon taking d;= n 1/2 one sees that in the one sample location model 0 of 
(5.2.2) corresponding to G(y) = y is given as a solution of 


(2) ¥; F(Yi—- 0) = n/2. 
Note that this @ is precisely the m.le. of @ when F(x) = {1 + exp(—x)} /, 
i.e., when the errors have logistic distribution! 


Another simplification of (1) occurs when we assume ;d;=0 and 
H,=F. Fixa teR and let c:= max{Y;—x;t; 1<i<n}. Then 


(3) K(t) = f [Bi dil(¥i ¢ y + xit)]? dy 
= 34 Bj didy_f Tmax(Yj —xjt, ¥i—xit) < y < c] dy 
= — ¥j Yj didj max(Yj — xjt, Yi — xit). 
Using the relationship 
(4) 2 max(a, b) =a+ b+ |a-b|, a, be R, 
and the assumption 4; d; = 0, one obtains 


(5) K(t) = 2 Ba, didj|¥j — Yi — Gj — xt | 


If dj = xj—x in (5), then the corresponding § is asymptotically 
equivalent to the Wilcoxon type R-estimator of §# as was shown by 
Williamson (1979). The result will also follow from the general asymptotic 
theory of Sections 5 and 6. 
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If dj=xj—x, 1<i¢n,and x3=0, 1<i¢3r xy=1, r41<i¢n 
then (1.1.1) becomes the two sample location model and 


rn 
K(t) = —2 eee [Yj -Yi—t| + a r.v. constant in t. 


Consequently here § = med{| Yj; — Yi], r+1<j<n, 1<i<r}, the usual 
Hodges—Lehmann estimator. The fact that in the two sample location model 
the Cramer—Von Mises type m.d. estimator of the location parameter is the 
Hodges—Lehmann estimator was first noted by Fine (1966). 

Note that a relation like (5) is true for general p and G. That is, 


suppose that p> 1, Ge DZ(R) and (5.2.21) holds, then V t € R’, 
P , , 
(6) K(t) = 23 de, dijday |G((Yi — xet)-) — G((¥i — xit)-)]. 
To prove this proceed as in (3) to conclude first that 
P ; , 
K(t) = —2 = re dijdxj G(max(Y; — xt, Yi — xit)-) 


Now use the fact that G((aVb)-) = G(a_)VG(b_), (5.2.21) and (4) to obtain 
(6). Clearly, formula (6) can be used to compute f in general. 
Next consider K*. To simplify the exposition, fix a téR? and let 
mis Yj Sit 1<i<n; b:= max{r;, —ry; 1<¢i<n}. Then from (5.2.19) 
we obtain 
+ P 2 
K*(t) = Bf [Bi dap{l(rs $y) — Mai < y) "dG (9), 


Observe that the integrand is zero for y > b. Now expand the quadratic and 
integrate term by term, noting that G may have jumps, to obtain 


P 
K*(t) =¥ BP didig{2G(ri V —rx)} — 2(t) 
jz1i 
— G((ri V r,)-) — G(—ri V -r,)}, 
where J(y) := G(y) — G(y-), the jr in G at yeéR. Once again use the 


fact that G(aVb) = G(a)VG(b), (4), the invariance of the double sum under 
permutation and the definition of {rj} to conclude that 


5.3 FINITE SAMPLE PROPERTIES AND EXISTENCE 113 


(7) K*(t) — x » » dijdx| | G(Yi- x;t) — G(-Y;, + x;.t) | —J(Yi- x ;t) 
5 {|G((¥i —xit)-) — G((Yx — xxt)-)| 
+ |G(-Y¥; + xit) — G(-Y;, + xxt)| }]. 


Before proceeding further it is convenient to recall at this time the 
definition of symmetry for a G € DZ(R). 


Definition 5.3.1. An arbitrary GeDZ(R), inducing a o—finite measure 
on the Borel line (R,.@), is said to be symmetric around 0 if 


(8) | G(y) — G(x)| = |G(-x-) - G(-y-)|, V x,yeR 

Or 

(9) dG(y) = — dG(-y), V yeR. 
If G is continuous then (8) is equivalent to 

(10) |G(y) — G(x)| = |G(—x) - G(-y)], V xyeR 


Conversely, if (10) holds then G is symmetric around 0 and continuous. 


Now suppose that G satisfies (8). Then (7) simplifies to 


p 7 4 4 
(77) K*(t) = - Yi Dk dij; G(Y; — xit) — G(-Y;, + xxt)| — J(Y;—x;t) 
— |G(-Y; + xit) — G(-Y: + xit)|]. 
And if G satisfies (10) then we obtain the relatively simpler expression 
p 4 4 
(7*)  K*(t) = BB; Be dijdusl|G(Wi — xit) — (Yu + aut] 
— |G(Y; —xit) — G(Yx — xit) |]. 


Upon specializing (7*) to the case G(y) = y, p = 1, diz n/? and 
x; =1 we obtain 


K*(t) =n’ Ye {]¥i + Y~ —2t] —]¥;—Yi|} 


and the corresponding minimizer is the well celebrated median of the 
pairwise means {(Y;+Yj)/2; 1<i< j< n}. 

Suppose we specialize (1.1.1) to a completely randomized design with 
p treatments, i.e., take 
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xij = 1, mj-1+1<¢1< mj, 
= 0, otherwise, 


where 1< nj <n is the jt2 sample size, mp = 0, mj = nyt ... + nj, 1 < j< p, 
mp =n. Then, upon taking G(y)=y, dij = xij in (7*), we obtain 


p nj nj 
R(t) = BE BY + Yay —2tj]-]Yy—Yajl}, tem’, 


where Yij = the ith observation from the jth treatment, 1 < j < p. 
Consequently, 6 = (fi, ... , fp) , where fj; = med {(Yi; + Yuj)2 
V<i¢k<¢nj}, 1<j<p. Thatis,ina completely randomized design with p 


treatments, # corresponding to the weights dj =x; and G(y)=y is the 
vector of Hodges—Lehmann estimators. Similar remark applies to the 
randomized block, factorial and other similar designs. 


The class of estimators ff also includes the well celebrated least 
absolute deviation (l.a.d.) estimator. To see this, assume that the errors are 


continuous. Choose G = 6) — the measure degenerate at 0 — in K’, to obtain 


(11) K*(t) = 3 [3B dy (Vs - xit <0) (Yi —xit > HP 


p ’ 
= 2 (3 dij sgn(Y; — x;t))’, w.p.1, V te R?. 


jet 1= 


Upon choosing dj; = xj, one sees that the r.h.s. of (11) is precisely the square 
of the norm of a.e. differential of the sum of absolute deviations 


Dt) = Dy |Yy - xit|, teR?. Clearly the minimizer of D(t) is also a 


minimizer of K*(t) of (11). 
Any one of the expressions among (7), (7’) or (7*) may be used to 


compute ff for a general G. From these expressions it becomes apparent 


that the computation of ff is similar to the computation of maximum 
likelihood estimators. It is also apparent from the above discussion that both 


classes {f} and {f"} include rather interesting estimators. On the one 
hand we have a smooth unbounded G, v.iz., G(y) = y, giving rise to 
Hodges—Lehmann type estimators and on the other hand a highly discrete G, 
v.i.z., G = 60, giving rise to the l.a.d.e.. Any large sample theory should be 
general enough to cover both of these cases. 


We now address the question of the ezistence of these estimators in 
the case p = 1. As before when p = 1, we write unbold letters for scalars 
and d;, xi for dy, x31, 1<i<n. Before stating the result we need to define 
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P(y) := Yy I(x; = 0) di{1(Yi < y) —1(-Yi < y)}, yeR 
Arguing as for (7) we obtain, with b = max{Y;, -Y3; 1<i< n}, 


(12) f || dG ¢ ¥4 1(xi = 0)| dil [G(b-)-G(¥i-) + G(b_)-G(-Y))] < o. 
Moreover, directly from (7) we can conclude that 
(3) =f I? dG<o. 


Both (12) and (13) hold for all n > 1, for every sample {Y;} and for all real 
numbers {dj}. 


Lemma 5.3.1. Assume that (1.1.1) with p= 1 holds. In addition, 
assume that either 


(14a) dix; > 0, Vi<i<¢n, or (14b) d;x; < 0, Vi<i¢n. 


Then a minimizer of K” exists if either Case 1: G(R) =, or Case 2: G(R) 
<w and dj=0 whenever x;=0, 1<¢1¢n. 


If G is continuous then a minimizer is measurable. 


Proof. The proof uses Fatou’s Lemma and the D.C.T. Specialize 
(5.2.19) to the case p = 1 to obtain 


K*(t) = f [i difI(Yi < y + xit) —I(-Yi < y — xit)}]” dG(y). 
Let K"(y, t) denote the integrand without the square. Then 
K(y, t) =T(y) + M(y, t), 
where 
(15) M*(y, t) = Di U(x; > 0) di{I(Yi < y + xit) —I(-Yi < y — xit)} + 
+ Oy (xy < 0) di{I(¥i < y + xit) —I(-Yi < y — x;t)}. 
Clearly, V y, teR , 


| K*(y, t)| < Dy I(xi # 0)[di| =: a, say. 


Hence 
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(16) '(y)-—a< My, t)< Ty) +4, Vy, teR 
Suppose that (14a) holds. Then, from (15) it follows that V ye R, 
K*(y,t) +a as t— +a, 


so that V yeR, 


(17) Ky, t) ~T(y) +a, as t— to. 

Now consider Case 1. If a=0 theneither all x; =0 or dj =0 for 
those i for which x; #0. In either case one obtains from (13) and (16) that 
VteR, K*(t)= f Tr? dG <o, and hence a minimizer trivially exists. 


If a>0 then, from (12) and (13) it follows that f (T(y) + a)" dG(y) 


= , and by (16) and the Fatou Lemma, lim inf, : K*(t) = o. On the 


+ 


other hand by (7), K’(t) is a finite number for every real t, and hence a 
minimizer exists. 
Next, consider Case 2. Here, clearly I =0. From (16), we obtain 


{X(y, t)? <a", Vy, teR, 
and hence 


K*(t) < a’G(R), VteR. 


By (17), M(y,t) +a, as t—+ 40. By the D.C.T. we obtain 
K*(t)  aG(R), as |t| o, 


thereby proving the existence of a minimizer of K” in Case 2. 


The continuity of G together with (7*) shows that K” is a 
continuous function on R thereby ensuring the measurability of a minimizer, 
by Corollary 2.1 of Brown and Purves (1973). This completes the proof in 
the case of (14a). It is exactly similar when (14b) holds, hence no details will 
be given for that case. o 


Remark 5.3.1. Observe that in some cases minimizers of K* could be 
measurable even if G is not continuous. For example, in the case of l.a.d. 
estimator, G is degenerate at 0 yet a measurable minimizer exists. 

The above proof is essentially due to Dhar (1991a). Dhar (1991b) 
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ves proofs of the existence of classes of estimators {f} and {f'} of 
(5.2.11 and (5.2.20) for p > 1, among other results. These proofs are 
somewhat complicated and will not be reproduced here. In both of these 
papers Dhar carries out some finite sample simulation studies and concludes 


that both, Bp and ff corresponding to G(y) = y, show some superiority over 
some of the well known estimators. 
Note that (14a) is a priori satisfied by the weights dj; = xj. o 


Now we discuss f of (5.2.14). We rewrite 
Q(t) = EE Ly f {Yi $< y) — Hi(ynit) HUY; ¢ y) — Hi(y—xjt)}G(y) 


where Ly = 1—L((iVj)n _), 1<i,j <n. Differentiating Q wrt. t under 
the integral sign (which can be easily justified assuming H; has density h; 
and some other mild conditions) we obtain 


(18) Q(t) =2n Yi 3; Ly f {1(¥i< y) — Wily —xit)}ha(y — xj) dG(y) xy. 


Specialize this to the case G(y) =y, L(s)=s, p=1, xj=1 and integrate 
by parts, to obtain 


Q(t) = -2n"* Bj Bj min(n —i, n — j){Hs(Yi — t) — 1/2} 
= —n ’Y;(n—i)(n + i—1) {H,(¥;—t) — 1/2}. 
Now suppose further that H;=F. Then £ isa solution t of 
(19) Yi (n —i)(n + i—1){F(Y;—t) — 1/2} = 0. 


Compare this § with 8 of (2). Clearly B given by (19) is a weighted 


M-estimator of the location parameter whereas 0 given by (7) is an 
ordinary M-estimator. Of course, if in (18) we choose L(s) = [(s > 1), 


p=1, xi=1, G(y) =y then §=f In general B may be obtained as a 
solution of Q(t) = 0. 
Next, consider Bp of (5.2.23). For the time being focus on the case 


p=1 and dj=x;—x. Assume, without loss of generality, that the data is 
so arranged that x;< x2<...< Xn. Let o&:= {(Yj — Yi)/(xj — xi); i < j, 
xj < xj}, f := min{t; te %} and ¢, := max{t; te of}. Then for xj < xj, t < 
t) implies t < (Yj — Yi)/(xj — xi) so that Rit < Rjz. In other words the 
residuals {Yj — txj; 1<j<n} are naturally ordered for all t < t, w.p.1., 
assuming the continuity of the errors. Hence, with T(s, t) denoting the 
Tia(s, t) of (22), we obtain for t < to, 
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k 
T(s, t) = di, k/n<s < (k+1)/n, 1<¢k<n-1, 
i= 
= 0, 0<s<i1/n, s=1. 
Hence, 
K"(t) ="S a {3 ds? t 
(t) = 2 oe {d i} ) t < tp. 


where ux = [L((k+1)/n) — L(k/n)], 1<k < n—1. Consequently 
* n-1 k 9 
K (to-) = 2 we {a di}". 
=1 1=1 
Similarly using the fact %; dj = 0, one obtains 
* n-l k y) * 
K (t) = 2} Wk id di} — i (t14), t> t. 


As t crosses over t% only one pair of adjacent residuals change their 
ranks. Let xj < xj41 denote their respective regression constants. Then 


#e * n-1l k 9 
K"(t-) —K"(tos) = "3 ox {B di}? 


n-l k 9 j-l y) 
— [2 Me 12 did” + oj {dja t 2 dif’) 
k#j 
j y) j-1 9 
= {Ea}? {ay 4 8 ay) 
But x1 ¢ x2... ¢ Xn, Xj < Xjur and Xj dj =0 imply 
ae 
Ye di<duat D dio. 
1=1 1=1 


Hence K (to) > K (to,). Similarly it follows that K (th) > K (t,_). 
Consequently, 6; and f are finite, where 


B, = min{te ¥ K (t,)= inf K (A)}, 
AE ofc 

By = max{te o% K (t-) = inf K (A)}, 
AE ofc 


and where o”¢ denotes the complement of &. Then 8 can be uniquely 
x 
defined by the relation 6B = (f; + f2)/2. 
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This B corresponding to L(s) =s was studied by Williamson (1979, 
1982). In general this estimator is asymptotically relatively more efficient 
than Wilcoxon type R-estimators as will be seen later on in Section 5.6. 

There does not seem to be such a nice characterization for p > 1 and 
general D satisfying (5.2.21). However, proceeding as in the derivation of 


(6), a computational formula for K’ of (5.2.22) can be obtained to be 


(20) K"(t) = 2 3 BB; dugdjs [L((Rie/n)-) —L((Rje/n)-)]. 


This formula is valid for a general o—finite measure L and can be used to 
* 
compute B. 


We now turn to the m.d. estimator defined at (5.2.26) and (5.2.27). 
Let dj=x;—x. The first observation one makes is that for teR, 


n n 
D,(t):= sup | ¥ di l(Yi¢ y + tdi)| = sup | D dy I(Rit< ns)]. 
yeR 3*! 0<s<1 1! 


Proceeding as in the above discussion pertaining to 6, assume, without loss 
of generality, that the data is so arranged that x,< xo<¢...< xp sothat d;< 
do¢...¢dn. Let %:= {(¥j — Yi)/(dj —di); di <0, dj >0,1<i< j<n}. 

It can be proved that JD, (Dn) is a left continuous non—decreasing (right 


continuous non—increasing) step function on R whose points of discontinuity 
are a subset of co. Moreover, if —w = fh) < 4 < to <<... < tn < tn = 


denote the ordered members of {, then Dj(t-) = 0 = Da(tms) and Di( tn) 


= Yi dj = Da(ts-), where dj = max (dj, 0). Consequently, the following 
entities are finite: 


Bs, := inf {teR; Da(t) > Da(t)},  Bs2:= sup {teR; Ds(t) < Da(t)}. 


Note that Bs2 > §s1 w.p.1.. One can now take (s = (fs1 + Bs2)/2. 
Williamson (1979) provides the proofs of the above claims and obtains 
the asymptotic distribution of fs. This estimator is the precise 
generalization of the m.d. estimator of the two sample location parameter of 
Rao, Schuster and Littell (1975). Its asymptotic distribution is the same as 
that of their estimator. 
We shall now discuss some distributional properties of the above m.d. 


estimators. To facilitate this discussion let denote any one of the 
estimators defined at (5.2.11), (5.2.20), (5.2.23) and (5.2.27). As in Section 


4.3, we shall write #(X, Y) to emphasize the dependence on the data {(xi, 


Yi); 1<i¢ n}. It also helps to think of the defining distances K, K’, etc. 
as functions of residuals. Thus we shall some times write K(Y — Xt) etc. for 
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K(t) etc. Let K stand for either K or K" or K’ of (5.2.10), (5.2.19) and 
(5.2.22). To begin with, observe that 


(21) K(t — b) = K(Y + Xb — Xt), V t, beR’, 
so that 
(22) A(X, Y + Xb) = A(X, Y) + b, V bER?. 


Consequently, the distribution of B— does not depend on 8. 
The distance measure Q of (5.2.13) does not satisfy (21) and hence 


the distribution of #— will generally depend on £. 
In general, the classes of estimators { B} and {f'} are not scale 


invariant. However, as can be readily seen from (6) and (7), the class {} 
corresponding to G(y) =y, Hi=F and those {D} that satisfy (5.2.21) and 
the class {f"} corresponding to G(y) =y and general {D} are scale 
invariant in the sense of (4.3.9). 

An interesting property of all of the above m.d. estimators is that 


they are invariant under nonsingular transformation of the design matrix X. 
That is, 


A XB, Y)= B ‘A(X, Y) for every pxp nonsingular matrix B. 


A similar statement holds for £. 

We shall end this section by discussing the symmetry property of 
these estimators. In the following lemma it is implicitly assumed that all 
integrals involved are finite. Some sufficient conditions for that to happen 
will unfold as we proceed in this chapter. 


Lemma 5.3.2. Let (1.1.1) hold with the actual and the modeled d.f. of 
e; equalto Hi, 1<i<n. 


(i) If either 
(ia) {Hi, 1<i<¢n} and G are symmetric around 0 and 
Hi, 1<¢1¢n} are continuous, 
or 
(ib) dij = — dn-isnj, xij = —Xn-aie,j and Hye F V 1l<i¢a, 
£j<P, 
then 


B and p are symmetrically distributed around B, whenever they 
exist uniquely. 
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(ii) Jf {Hy, 1<i<n} and G are symmetric around 0 and either 
{H;,1<i<n} are continuous or G is continuous, 


then 


B’ is symmetrically distributed around B, whenever it exists uniquely. 


Proof. In view of (22) there is no loss of generality in assuming that 
the true f is 0. 


Suppose that (ia) holds. Then /{X, Y) ri AX, -Y). But, by 


definition (5.2.11), AX, -Y) is the minimizer of K(-Y — Xt) wart. t. 
Observe that V tc R?, 


p , y) 
K(—Y — Xt) = f [ Da daj{I(—Yi < y + xit) — Hi(y)}]° dG(y) 
p , 9 
= % f (8 dil - Yi < -y—xit) — HQ) dG(y) 


p , 2 
— Py fl »»F dij{I(¥i < y —xjt)— Hi(y-)}] dG(y) 


by the symmetry of {H;} and G. Now use the continuity of {Hj} to 
conclude that, w.p.1., 


K(—Y — Xt) = K(Y + Xt), VteR, 


so that AX, —-Y)=- AX, Y), w.p.1, and the claim follows because 
—fX, Y) = argmin {K(Y + Xt); teR?}. 


Now suppose that (ib) holds. Then 
K(Y + Xt) 
Pp ’ 
= 3. f Bi dassni{l(Vi $ ytanant) — Fy) dG(y) 


Pp / 
= 2, Ss dnisns{l(Yn-ins y+%n-snt) — F()H? aG(y) 
= K(Y — Xt), V teR?. 


This shows that — K(X, Y) i A(X, Y) as required. The proof for B 18 
similar. 
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Proof of (ii). Again, f'(X, Y) = B'(X, —Y), because of the symmetry 
of {H;}. But, 


K*(—Y — Xt) 
p r 4 
= 3. f Bi dil(-¥i cy + xit)-1 + (-¥i <-y + xit)}]? dG) 


p y ’ 
— Ps f dij {I(Yi< y+ xit)-—1+1(Yi<-yt+ x;t)}]? dG(y) 


= K*(Y + Xt), V teR?, 


w.p.1, ifeither {Hi} or G are continuous. o 


5.4. ASYMPTOTICS OF MINIMUM DISPERSION ESTIMATORS: A 
GENERAL CASE 


This section gives a general overview of an asymptotic theory useful in 
inference based on minimizing an objective function of the data and 
parameter in general models. It is a self contained section of broad interest. 
In an inferential problem consisting of a vector of n observations 
= (Cn, .---- Gon)’, not necessarily independent, and a p-dimensional 


parameter 6OcR’, an estimator of § is often based on an objective function 
M,(G;, 9), herein called dispersion. In this section an estimator of 06 
obtained by minimizing My(¢y, -) will be called minimum dispersion 
estimator. 

Typically the sequence of dispersion M, admits the following 
approximate quadratic structure. Writing M,(@) for Mn(¢n, 4), often it 
turns out that M,(6) — My(4), under 4, is asymptotically like a quadratic 
form in (@— 4), for @ close to @ in a certain sense, with the coefficient of 
the linear term equal to a random vector which is asymptotically normally 
distributed. This approximation in turn is used to obtain the asymptotic 
distribution of minimum dispersion estimators. 

The two classical examples of the above type are Gauss’s least square 
and Fisher’s maximum likelihood estimators. In the former the dispersion 
M, is the error sum of squares while in the latter My, equals -loghn, In 
denoting the likelihood function of @ based on G,. In the least squares 
method, M,(8@) — My(@) is exactly quadratic in (@— @)), uniformly in 0 
and 6). ‘The random vector appearing in the linear term is typically 
asymptotically normally distributed. In the likelihood method, the well 
celebrated locally asymptotically normal (l.a.n.) models of Le Cam (1960, 
1986) obey the above type of approximate quadratic structure. Other well 
known examples include the least absolute deviation and the minimum 
chi-square estimators. 
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The main purpose of this section is to unify the basic structure of 
asymptotics underlying the minimum dispersion estimators by exploiting the 
above type of common asymptotic quadratic structure inherent in most of 
the dispersions. 

We now formulate general conditions for a given dispersion to be 
uniformly locally asymptotically quadratic (u.l.a.q.d.). Accordingly, let 2 


be an open subset of R? and Mp, n > 1, be a sequence of real valued functions 


defined on Rx] such that My,(-, 9) is measurable for each @ We shall 
often suppress the ¢, coordinate in My and write M,(@) for Mz(q, 9). 
In order to state general conditions we need to define a sequence of 
neighborhoods N,( %) == {0ED, | &()(0- ISB where @ is a fixed 
parameter value in (2, B is a finite number and {6,(0))} is a sequence of 
pxp symmetric positive definite matrices with norms ||4,()|| tending to 
infinity. Since 9 is fixed, write 6), Nn for 6)(4), Nn(%), respectively. 
Similarly, let P, denote the probability distribution of G, when 0= 4. 


Definition 5.4.1. A sequence of dispersions {M,(0), @€Nyp}, n > 1, is 
said to be u.l.a.q. (uniformly locally asymptotically quadratic) if it satisfies 
condition (Ai) — (A3) given below. 

(Al) There exist a sequence of px1 random vector Sp(@)) and a sequence 


of pxp, possibly random, matrices W,(4), such that, for every 0 < 
B < o, and for all @EN,y, 


Mz() = Mn() + (8— 9) Sn() + 3(8—%) Wa(%)(0— 0) + dp(1), 


where "0,(1)" is a sequence of stochastic processes in 9 converging to 
zero, uniformly in 0€Ny, in Py-probability. 


(A2) There exists a pxp non-singular, possibly random, matrix W/(6) 
such that 


6," Wn(90)6n = W( 0) + op(1); (Px). 
(A3) There exists a pxl1 r.v. Y(0o) such that 


4 65'Sn( 00), 6x Wn(9)on) > £( ¥(O), W( 9%) ) 


where .4, - denote joint probability distributions under P, and in the 
limit, respectively. 


Denote the conditions (Al), (A2) by (Al) and (A2), respectively, 
whenever W is non-random in these conditions. A sequence of dispersions 
{Mj} is called uniformly locally asymptotically normal quadratic (u.l.a.n.q.) if 


(A1), (A2) hold and if (A3), instead of (A3), holds, where (A3) is as follows: 
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(A3) There exists a positive definite pxp matrix %(0@)) such that 
6'8,() — N(0, 3(6)), (Pn). 


If (A1) holds without the uniformity requirement and (A2), (A3) hold 
then we call the given sequence My locally asymptotically quadratic (1.a.q.). 
If (A1) holds without the uniformity requirement and (A2), (A3) hold then 
the given sequence My, is called locally asymptotically normal quadratic. 


In the case M,(0) = —én L,(6), the conditions non—uniform (A1), 


(A2), (A3) with |||] = O(n/ 2) determine the well celebrated l.a.n. models 
of Le Cam (1960, 1986). For this particular case, W(0)), “(0) and the 
limiting Fisher information matrix F( 6), whenever it exists, are the same. 


In the above general formulation, My is an arbitrary dispersion 


satisfying (Al) — (A3) or (A1) — (A3). In the latter the three matrices 
W(0,), (0) and F(@) are not necessarily identical. The l.a.n.q. 
dispersions can thus be viewed as a generalization of the l.a.n. models. 


Typically in the classical i.i.d. setup the normalizing matrix 4, is of 
the order square root of n whereas in the linear regression model (1.1.1) it is 


of the order (x xy!/ 2 In general 6, will depend on @) and is determined 
by the order of the asymptotic magnitude of S,(). 


An example where the full strength of (Al) — (A3) is realized is 
obtained by considering the least square dispersion in an explosive 
autoregression model where for some \ > 1, Xi = pXi-1 + ei, i > 1, and 
where {e;,i>1} arei.i.d.r.v.’s. For details see Koul and Pflug (1990). 


We now turn to the asymptotic distribution of the minimum 
dispersion estimators. Let {M,} be a sequence of u.l.a.q.d.’s. Define 


(1) 0, = argmin{M,(t), te}. 


Our goal is to investigate the asymptotic behavior of @, and My( On). 
Akin to the study of the asymptotic distribution of m.l.e.’s, we must first 


ensure that there isa 0, satisfying (1) such that 
(2) | 6n( 8 — 8) | = Op(1). 


Unfortunately the u.l.a.q. assumptions are not enough to guarantee 
(2). One set of additional assumptions that ensures (2) is the following. 


(A4) Ve>0O04a0< Ze <om and Nye such that 
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(A5) V e>0 and 0<a<o,Jd an Noe anda b (depending on ¢€ and a) 
such that 


Pn ( M,(t) > a) >1-e, V n> Nee. 


inf 

|| bn (t—Ao) ||>b 
It is convenient to let 

Qn(A, %) := (8—%) Sn(O) + (1/2)(8— 0) Wn(%)(@—%), OE RP, 
and 6, := argmin{Q,(0, 0), OcR?}. Clearly, 6, must satisfy the relation 
(3) Bx 6n( On — 0) = —6n'Sn( 0). 
where 2p := 6;'Wné, , where Wa = Wz2(). 

Some generality is achieved by making the following assumption. 
(A6) l| &x( 1 — 9)|] = Op(1). 


Note that (A2) and (A3) imply (A6). We now state and prove 


Theorem 5.4.1. Let the dispersions My satisfy (Ai), (A4) — (A6). 
Then, under Py, 


(4) | (9x = @n) bn Brn bal On = 6x) | = op(1), 

(5) inf 5 Mn( 8) — Mn(6) = — (1/2)(@n — 8) Wa (On — 00) + op(1). 
Consequently, if (A6) is replaced by (A2) and (A3), then 

(6) b(n — Oo) — 3 {W(0)}” ¥( 6), 

and 

(7) inf 5 Mn(4) — Mn() = —(1/2) Sn(%)én Bn bn Sn( 9) + op(1). 


If, instead of (Al) —(A3), Mn satisfies (Al) — (A3), and if (A4) 
and (A5) hold then also (4) — (7) hold and 


(8) fn(On— &) — N(O, T(6)), 


where T'(0o) = {W(4)} (%){W(0)} 
Proof. Let Ze be asin (A4). Choosean a> Ze in(A5). Then 
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[|Mn(4)| < Ze, inf Mu(0 + 6,'h) > a 
|h| >b 
c[ inf Mn(0 + 6h) < Ze, inf Mn(0 + 6,'h) > al 
|h| <b |h|>b 
c | inf Mp(0) + 6:'h) > inf Mn(0+ 6, 'h)]. 
|h|>b |h| <b 


Hence by (A4) and (A5), for any ¢€ > 0 there exists a b (now depending 
only on ¢€) such that V n> NyeVNoe, 


(9) Py( inf My(0 + dh) > inf My(0) + 6, 'h)) >1—€, 
|h| >b |h| <b 


This in turn ensures the validity of (2). Having verified (2), (A1) now yields 
(10) Mn(8n) = Mn(8) + Qu(4n, %) + Op(1), (Pn). 
From (A6), the inequality 

|int ge M,(4) — inf ge y., [Mn( 8%) + Qn(8, 4)]| 


£ SUP gy [Mn(4) — [Mn(4%) + Qn(9, 9)]| 
and (A1), we obtain 


(11) Mn(6n) = Mn(9%) + Qn(9n, 9%) + op(1), (Pn). 
Now, (10) and (11) readily yield 

Qn( On, %) = Qn(8n, %) + op(1), (Pn), 

which is precisely equivalent to the statement (4). The calim (5) follows 

from (3) and (11). The rest is obvious. O 


Remark 5.4.1. Roughly speaking, the assumption (A5) says that the 
smallest value of M,(@) outside of MN, can be made asymptoticall 
arbitrarily large with arbitrarily large probability. The assumption (A4 
means that the sequence of r.v.’s {Mn(4))} is bounded in probability. This 
assumption is usually verified by an application of the Markov inequality in 
the case En|My(4)| = O(1), where E, denotes the expectation under Pp. 
In some applications M,(@)) converges weakly to a r.v. which also implies 
(A4). Often the verification of (A5) is rendered easy by an application of a 
variant of the C-S inequality. Examples of this appear in the next section 
when dealing with m.d. estimators of the previous section. Oo 
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We now discuss the minimum dispersion tests of simple hypothesis, 
briefly without many details. Consider the simple hypothesis Hy: 0= @. In 
the special case when My is —én Lp, the likelihood ratio statistic for testing 
Hy is given by —2 inf{M,(6@) — My(%); 62}. Thus, given a general 
dispersion function Mn, we are motivated to base a test of Hy on the 
Statistic 


with large values of T, being significant. 
To study the asymptotic null distribution of Ty, note that by (7), 


Tn = Sn(0)én Bn bn Sn( 0) + 0p(1), (Pn). Let Y, Wetc. stand for Y(), 
W( 4), etc. 


Proposition 5.4.1. Under (Al) — (A3), (A4), (A5), the asymptotic null 
distribution of Tn is the same as that of Y W 'Y. 

Under (A1) —(A5), the asymptotic null distribution of Ty is the same 
as that of Z BZ where Z isa N(0, Ipxp) 7.v. and B= mwipl2 4 


Remark 5.4.2. Clearly if W(@) = &(@) then the asymptotic null 


distribution of T, is x2. However, if W # &, the limit distribution of T, 
is not a chi-square. We shall not discuss the distribution of Ty, under 
alternatives. 0 


A class of examples of the u.l.an.q.d.2s where (Al) — (A5) are 
satisfied with typically W # X is given by Huber’s M—dispersions for the 
model (1.1.1), v.i.z., 


M,(t) = 44 p(Yi- x;t), t € RP, 
where p is a convex function on R with its almost everywhere derivative ¥. 
As mentioned in Chapter 4 the estimators obtained by minimizing My are 
studied extensively in the literature, see Huber Se and references there 
in. These estimators include the least square and the l.a.d. estimators of £. 
Now, let g(t) := f[v(x) — ox — t)]"dF(x), teR, r = 1, 2, and suppose that F 
and w are such that f~dF = 0, 0 < {yaF <o, gi is continuously 


differentiable at 0 and that g» is continuous at 0. Then it can be shown, 
under (NX), that Huber’s dispersion is ].a.n.q. with 


= B, i =(XX)”?, 8,(6) =—Sixi WYi—xif), 
W,(f) = g(0)X X, W=&(0)Ipxp, and B= fydF Ipxp. 
This together with the convexity of p and a result in Rockafeller (1970) 


yields that the above dispersion is u.l.a.n.q.d. See also Heiler and Weiler 
(1988) and Pollard (1991). 
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For p(x) = |x| and F continuous, Ax) = sgn(x) and g(t) = 
2r| F(t) — F(0)|. The condition on g; now translates to the usual condition 
on F in terms of the density f at 0. For p(x) = x2, W(x) = 2x, g,(t) = 2t, 


so that g, is trivially continuously differentiable with g,(0) = 2. Note that 


in general W #2 unless g(0) = {y¥2 dF which is the case when y is 
related to the likelihood scores. 

The next section is devoted to verifying (A1) — (A5) for various 
dispersion introduce in Section 4.2. 


5.5. ASYMPTOTIC UNIFORM QUADRATICITY 
In this section we shall give sufficient conditions under which K,, Ki, of 


Section 5.2 will satisfy (5.4.A1), (5.4.A4), (5.4.45) and Kk and Q of 


Section 5.2 will satisfy (5.4.A1). As is seen from the previous section this 
will bring us a step closer to obtaining the asymptotic distributions of 
various m.d. estimators introduced in Section 5.2. 


To begin with we shall focus on (5.4.A1) for K), Ky and K. Our 


basic objective is to study the asymptotic distribution of B, when the actual 


d.f.’s of {eni,1<¢i< n} are {Fni, 1 <i <n} but we model them to be 
{Hni, 1 <i <n}. Similarly, we wish to study the asymptotic distribution of 


ff, when actually the errors may not be symmetric but we model them to be 


so. To achieve these objectives it is necessary to obtain the asymptotic 
results under as general a setting as possible. This of course makes the 
exposition that follows look somewhat complicated. The results thus 
obtained will enable us to study not only the asymptotic distributions of 
these estimators but also some of their robustness properties. With this in 
mind we proceed to state our assumptions. 


(1) X satisfies (NX). 

(2) With di 5) denoting the jth column of D, ||d yy > 0 for at least 
one j; Id IF = 1 forall those j for which lld.yI° >0, 1<j<p. 

(3) {Fnoi, 1<i<n} admit densities {fni, 1<i<n} wrt. X. 

(4) {Gn} is a sequence in DZ(R). 

(5) With dai = (dais, ..., nip), the ith row of D, 1<i<a, 


f 3s [lduill? Fai(I—Fni) dGn = O(1). 


5.0 


(6) 


(7) 


(8) 


(9) 


(10) 


(11) 


(12) 
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With yn := Yi dual” fni, 
bn 
lim supp f f Yn(y + x) dGp(y) dx = 0 


an 


for any real sequences {an}, {bn}, an < bn, bn —an — 0. 


With dyij = daij — dai, 1 < j < Dd; Cni = Axni, Knit= |[cnill, 1 <i ¢ 2, 
¥ 6>0,¥ |v <B, 


P 
lim supn f [D5 dni{Fnily + w’Cni + dkni) — 
— Fai(y + w’¢ni— 5%mi)}]" dGa(y) ¢k &, 
where k is a constant not depending on v and 6. 
With Rpj _— i dnij Xni Lai Vyj = ARjj, 1 < J < DP; 
2 
2%, J llenill"dGn = O(1). 
With pnj(y, a) := Yi dnijFni(y + Cniu), for each uw eR’, 
P ty) tr) 2 
2, J Wai(y, 1) — unj(y, 0) — w’ vnj(y)]"dGa(y) = 0(1). 


With mpj := %i dnaj[Fni— ni], 1< j§< p; my = (mar, -.. , Map) 
2 
film? ac. = (2). 


With Tna(y) := (vai(y), ---) Yap(y)) = DA (y)XA, where A. is 


defined at (4.2.1), and with Fe = jTn En dGy, where gn€L(Gy), [= 
1,2,n>1,is such that gy > 0, 


0 < lim inf, {g2 dGn < lim supy [g2 dGy < a, 
and such that there exists an a> 0 Satisfying 


lim inf, inf{O 1,0; OER?, || 6] = 1} > a 
Either 
(a) O dnixnsAO> 0 V1<i¢n and V OeR?, |/4] = 1. 
Or 
(b) 6 dnixniA0< 0 V1¢i¢n and V OER, 4] = 1. 
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In most of the subsequent applications of the results obtained in this 
section, the sequence of integrating measures {G,} will be a fixed G. 
However, we formulate the results of this section in terms of sequences {Gy} 
to allow extra generality. Note that if G, = G, GeDZ(R), then there always 
exists a g€L,;(G), r = 1, 2, such that g > 0,0 < [g29dG <o. 


Define, for ye R, we R, 1< jK<p, 
(13) Si(y, u) -_ Vialy; Au), Yi(y, u) a Si(y, u) ui(y, u). 


Note that for each j, Sj, uj, Yj are the same as in (2.3.2) applied to Xni = 
Yni, Cni = Axpj and dyj = dnij, l<i¢n,1¢j<p. 
Notation. For any functions g,h: prt _, R, 
2 2 
|gu—hyln:= f {g(y, u) —h(y, v)}"dGa(y). 
Occasionally we write lg |? for gol 2. 


Lemma 5.5.1. Let Yn, ..., Ynn be independent r.v.’s with respective 
d.f.’s Fry, ..., Fan. Then (5) implies 


(14) BEY fol = 0(1). 
Proof. By Fubini’s Theorem, 
(15) BE |¥jola = f Bs idl? Fs — Fi) dG, 
and hence (5) implies the Lemma. O 


Lemma 5.5.2. Let {Yni} be as in Lemma 5.5.1. Then assumptions 
(1) — (4), (6) — (10) imply that, for every 0< B <a, 


p 
" B sup 2,|¥iu— Yala = o(1) 
U -_ 


Proof. By Fubini’s Theorem, V u €J(B), 
Laer 2 2 , 
(17) BE |¥}u—Yfola ¢ fB lldil?|Fi(y + 1) — Fi(y)|4Gu 
bn 
< J, Cf mn(y + x) dGa(y)) dx 
—n 


where by = B max; ki, Yn asin (6). Therefore, by assumption (6), 
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p 
(18) EY |¥ju—Yjoln=o(1),  ¥ ueR?. 


To complete the proof of (16), because of the compactness of 


MB) := lf ||ul| < B}, it suffices to show that Ve >0 3 a 6>0 such 
that V veMB ), 


p 
lim supp E_ su Y | Ljy — Liy| < ¢, 
Pn ide Pe ju jv| ss 
where 

Liu = Yeu — Volz, ue RP, 1< j ¢ p. 


Expand the quadratic, apply the C—S inequality to the cross product 
terms to obtain 


(20) |Ljn—Ljv] < |Y¥fu— You] 2 + 2] ¥9u—Yovln |[¥9v— Yoon, 1<j<p. 
Moreover, 

y? — y?.12 ge 9 12 9 _ 9.12 
(21) | ju— ivin < 2{| ju— ivin + |Hju—Hjvin}, 

Ss? Ss? 2 4 + ,2 x > 12 

[Sju—Sjv[n ¢ 2{|Sju—Sjvjn + [Sju — Sjvla}, 

0 Qo 12 + + 2 = - 12 . 

| Mju — Bjvin < 2{|Hju— wjvi[n + | Hiu — Hjv|n}, 1<jc<p, 


where Si, 4; are the S?, »? with dj replaced by dij, d{j := max(0, dij), 
dij = dij — dy, 1<i¢n, 1<¢ jp. 

Now, ||u — v|| < 6, nonnegativity of {dij}, and the monotonicity of 
{Fi} yields (use (2.3.15) here), that for all 1< j<p, 

[ju — divin $ Bi da{Fi(y + civ + xi) — 


—F,(y + civ — 64;)}]? dGy(y). 
Therefore, by assumption (7), 


(22) lim ea Ie wl l< E lulu = els . ake. 
|| u-v]| § 3 


By the monotonicity of Sj j and (2.3.15), ||[u—-v|| < 6 implies that for 
all 1¢j<p, yer, 
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— di; dij I(—6Ki < Yi-—w’q,—y < 0) 
+ + 
¢ Sj(y, u) — Sj(y, v) 
«oi dij I(0 < Yi—vw’ej—y < 6kj). 


This in turn implies (using the fact that a <b <c implies b? < a2 + c for 
any reals a, b, c) 


{Si(y, u) — Sj(y, v)}? 
¢ {2 dij 10 < Yy—y —w'es < bmi)}° + 
+ {dy dij I(—dK; < Yi-y—vw’ej< 0)}? 
€ 2 {35 dij M(x < Yi—y — wes < 6x3) }° 
for all 1<j<p andall yé€&. Now use the fact that for a, b real, (a + b)’ 
< 2a* + 2b* to conclude that, for all 1< j<p, 
(23) |Sju~Sjvln 
<4 f i di j(1(—6i3 < Yi-—y—vw’ei< bK;) — 
= pily, Vv; 6)}}° dG,(y) + 
+ 4f {05 dij pily, V, 6)}"dG,(y) 
= A{I; + Il}, (say), 
where pi(y, v, 6) = Fi(y + wei + d«i) — Fi(y + v’ cy — 6x3). 
But (dj) < dij for all i and j implies that 
P P £2 
BE I = 2, f 3i(dij) pily, v, 6) (1 — pi(y, v, 6))dGn(y) 
bn 
2 
< f¥slldill” rily, v, dGa(y) < Lf anly + 8)dGa(y) ds, 


by (3) and Fubini, where ay = (—B — 6)max; ki, bn = (B + 6)maxj i, and 
where ‘7, is defined in (6). Therefore, by the assumption (6), 


(24) BE I = o(1). 
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From the definition of JJ; in (23) and the assumption (7), 


P 
(25) lim supn YI; ¢k6. 
J = 
From (21) — (25), we obtain 
(26) lim supy E eohee B; Yiu Y3y|2 < 40k 6”. 
l|a—-vi|<6 J 


Thus if we choose 0 < 6< (€/40k)*/? then 19) will follow from (26), (20) 
and (18). This also completes the proof of (16). 


To state the next theorem we need 
. p , 9 

(27) Kt) = Bf {YG(y, 0) +t Ri(y) + mi(y)}° dGa(y). 
In (28) below, the G in K, is assumed to have been replaced by the 
sequence Gp, just for extra generality. 

Theorem 5.5.1. Let Yni, ...., Ynn 0e tndependent r.v.’s with 
respective d.f.’s Fni, .--, Fan. Suppose that {X, Fni, Hni, D, Gn} satisfy (1) 
—(10). Then, for every 0<B< 0, 


(28) amie |K,(Au) — K,(Au)| = o(1). 


Proof. Write K, K etc. for K), K, etc. Note that 
P oO oO 2 
K(Au) = 2 f (S3(y, w) - vi(y) + m(y)}" dGu(y) 
p 
= 3 f1Viy, w) — Yi) + YQ) + wy(y) + mi) 
+ u3(y, a) — pf(y) — w’H4(y)]” dGu(y) 


where Y3(y) = Yj(y, 0), uj(y) = vi(y, 0). Expand the quadratic and use the 
C-S inequality on the cross product terms to obtain 


(29) |K(Au) — K(Au)| 
P 2 
cE {l¥fu-Yiela + [aja a} — aryl 
+ 2/Y§u—Yjvln [| ¥9 + 0/45 + mln t |yju— vj —v’yj[ a] 


+ 21¥9 + wy + mj[n- [sha — 4} — ’Y)[n F- 
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In view of Lemmas 5.5.1, 5.5.2 and assumptons (8) and (10), (28) will 
follow from (29) if we prove 


2 
(30) 5 S [uu — af — 04/2 = 0(1). 
Is I BI 

a — pi; —wW’ v;|2, 1<j< p, ueR. In view of the compactness 

Ba ant the assumption (9), it suffices to prove that V e > 0,3 a6>0 
: Vve 
(31) lim supn sup 3 16a- €iv| < €. 
—_ u-v(|<6 

u 


| fiu — Ejvl <2 ere + |lu— vil? [lila 
GH? [lagu —u3vln + [lu — vil [lr%lln] 
+ |pu—n3e[n [fu — vil [lyjlln}- 
Hence, from (22) and the assumption (9), 
Lh.s. (31) <2 {462 + (a + 2k 7a?) = ko 


P 
where a = lim supp x AR . Therefore, choose 6 < e/k, to obtain (31), 
hence (30) and therefore the Theorem. O 


Our next goal is to obtain an analogue of (28) for K.. Before stating 
it rigorously, it helps to rewrite K> in terms of standardized processes {Yj} 


and {3} defined at (9). In fact, we have 
K)(Au) = by f [S§ (y, w) — i da + Sj (-y, u))’ dG,(y) 
= 3 S103 a) YF (9) + YF yw) YF) 
+ p§ (y, u) — uj (y) — w’;(y) 
+ pi (-y, u) — pj (-y) — u’v;(y) 
+ woi(y) + Wi (y) + mj (y)]” dGa (y) 


where 
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Wj(y) == Y}(y) + Y§(-y), Vi(y) = v;(y) + 44(-y), 
mi(y) := Yi dij {Fi(y) —1 + Fi(-y)} 


= u3 (y) + uj (-y) — i diy, yeR, 1<j<p. 

Let 
aan P + + 2 Pp 
(32) Kj(Au) = 2 f [Wj + mj + wvi}" dGa, w ER?. 


Now proceeding as in (29), one obtains a similar upper bound for 
|K}(Au) — Ks(Au)| innvolving terms like those in r.h.s. of (29) and the 


terms like |Yju—Yjl-n, |#ju— Hj —U’Y%|-n ||4%||-n, |Yj|-n, where for 
any function h: R’*! — R, [hul2n = fh?(-y, u) dGr(y). It thus becomes 
apparent that one needs an analogue of Lemmas 5.5.1 and 5.5.2 with G,(-) 
replaced by Gn(—-). That is, if the conditions (5) — (10) are also assumed 
to hold for measures {Ga(—-)} then obviously analogues of these lemmas 
will hold. Alternatively, the statement of the following theorem and the 
details of its proof are considerably simplified if one assumes Gy, to be 
symmetric around zero, as we shall do for convenience. Before stating the 
theorem, we state 


Lemma 5.5.3. Let Yni, ..., Ynn be independent r.v.’s with respective 


d.f.’s Fut, .., Fan. Assume (1) — (4), (6), (7) hold, {Gn} satisfies (5.3.8) 
and that (33) hold, where 
(33) 3: lldaill? {Fui(-y)+1-Fni(y)} 4Ga(y) = O(1). 
Then, 
p 
(34a) EX |¥}ol-n = O(1), 
and 
p 
(34) E sup Y|Y9u—Y%ol27n=0(1), VWO0<B<o.0 
[lull <3 3°" 


This lemma follows from Lemmas 5.5.1 and 5.5.2 because under 
(5.3.8), I.b.s.’s of (34a) and ay are equal to those of (14) and (16), 
respectively. The proof of the following theorem is similar to that of 
Theorem 5.5.1. 


Theorem 5.5.2. Let Yn, .. , Ynn 05e independent r.v.’s with 
respective d.f.’s Fni, ..., Fon. Suppose that {X, Fai, D, Gn} satisfy (1) — 
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(4), (6) — (9), (5.3.8) for all n > 1, (33) and that 


(35) 3, f {mi}? aG.(y) = 00), 
Then, Vv 0<B<a, 
(36) E : ‘4 |K{(Au) — K*(Au)| = o(1). o 


Remark 5.5.1. Recall that we are interested in the asymptotic 
distribution of A'(A, — f) which is a minimizer of K(8 + Au) w.r.t. w. 


Since B, satisfies (5.3.22), there is no loss of generality in taking the true f 
equal to 0. Then (28) asserts that (1 /2)K, satisfies (5.4.A1) with 
(37) & = 0, 6 =A‘, Si =A_ Tn, Wri=A &@, A, 

Fn := —f Ta(y){¥G(y) + m,(y)} dGaly), 

By i= f la(y)Pa(y) dGa(y), 


where T',(y) = AX A (y)D, A as in (4.2.1), YD/:= (Yi, ---. Yp) and 
m) = (my, ...., Mp). 
In view of Lemma 5.5.1, the assumptions (5) and (10) imply that 
EK,(0) = O(1), thereby ensuring the validity of (5.4.A4). 
Similarly, (36) asserts that (1/2)K) satisfies (5.4.A1) with 
(38) 0=0, & =A, S,=A SH, Wn=ASzA, 
Tp := —f Taly){Wy(y) + mp(y)} dGaly), 
Br := [Taya (y) dGa(y), 


where T'3(y) := AX A*(y)D, A*(y) == A'(y) +A (-y), yER?, 
We := (Wi, .... Wp) and m}’:= (mij, ..., mp). 

In view of (12), (31), (33) and (5.3.8) it follows that (5.4.A4) is 
satisfied by K}(0). 


Theorem 5.4.1 enables one to study the asymptotic distribution of B, 
when in (1.1.1) the actual error df. Fp; is not necessarily equal to the 
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modeled df. Hni, 1 < i <¢ n. Theorem 5.4.2 enables one to study the 


asymptotic distribution of 6, when in (1.1.1) the error df. Fy; is not 
necessarily symmetric around 0, but we model it to be so, 1 <i< n. o 


So far we have not used the assumptions (11) and (12). They will be 
now used to obtain (5.4.A5) for K, and Kj. 
Lemma 5.5.4. In addition to the assumptions of Theorem 5.5.1 assume 


that (11) and (12) hold. Then, V €>0,0<2z<0o0,4 N (depending only on 
€) and a B (depending on €,z) 3 0< B<a, 


(39) P( inf Kj(Au) >z)>1—e, V nN, 
[all> 

(40) P( inf K (Au) >z)>1-«, V n>N. 
[all> 


Proof. As usual write K, K etc. for K), K, etc. Recall the 


definition of [, from (11). Let k,(6) := 0T,0, 0¢R. By the C-—S 
inequality and (11), 


2, un uz Sell? ) 
(41) supp gay VHn(8)176 Pall? $ 2% lll gn = O(2). 
Fix an ¢>0anda z€(0,). Define, for teR’, 1 < j< p, 
Vj(t) = f ty} + t Rj + mj} gn dGn, 
n 
Vi(t) = f [Vialy, t) — 2 daij Hnily)] gn(y) dGa(y). 
Also, let V:= (Vi, Vp); V:= (Vi, ...,Vp), Yn c= lent 2, 7 := lim supn np. 
Write a ucR? with |lul] > B as u=r0, |r| > B, ||@|] =1. Then, 
by the C—S inequality, 


infK(Au)> inf (0 V(rA8))"/, 
=] 


[al >B [r|>B, || 4 
inf K(Au)> inf (0 V(rA8))"/n. 
[al] >B |r| >B, || ol]=1 


It thus suffices to show that Ja Be€(0,) and N 9 
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(39) P( inf (0 V(rA6))"/y,>z)>1—€«, V¥ nN, 
|r] >B, |] al]=1 

(46) P( inf (0 V(rA0))?/%m>z)21—«, V¥ nN. 
[r]>B, || @|=1 


But, V uek?, 
‘ p y) 
|| V(Au) — V(Au)]| < 2% B4]¥fu—Yoeln + [af — sf —w’yi| a}. 


Thus, from (11), (16) and (30), it follows that V Be(0, o), 


(42) SUP | al|<B | V (Au) — V(Au)|| = 0p(1). 
Now rewrite 
6 V(rA0) = 0 T+1rk,(H), T := (Ty, ..., Tp) with 
Tj := f {Yj + mj} gn dGn, 1<j<p. 


Again, by the C—S inequality, Fubini, (16) and the assumptions (10) and 
(11) it follows that 3 N, and b, possibly both depending on e, such that 


(43) P(||T'|| < b) > 1 — (€/2), n> Nj. 
Now choose B such that 
(44) B > (b + (za)/2) a}, 


where a isasin(11). Then, with ap := inf{|k,(4)|; ||@|| = 1}, 


45) P( inf (0 V(rA0))"/7%m>z 
(45) Ca tales (rA9))"/7n > 2) 


= P(|O V(rA8)| > (z7n)/?, ¥ |] =1, I] =B) 
> P(||O T] — |r] [kn(4)||> (27)'””, V |] = 1, |x] = B) 


> P(|| TI] < -(270)*/? + B on) > P(|| Tl < 427)'/? + B a) 
> P(||T || < b) > 1 —(e/2), V n> Ny. 
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In the above, the first inequality follows from the fact that ||d] — |c|]| < 
|d + c|, d, c reals; the second uses the fact that | 0T | < || 7] for all 


a, : 1/2 1/2 
|| | = 1; the third uses the relation (—w, -(z7)‘“ + B a) C (—w, {(27n) /~ + 
B Qn); while the last inequality follows from (43) and (44). " 


Observe that 0 V (rA@) is monotonic in r for every ||@| = 1. 
Therefore, (45) implies (40) and hence (40) in a straight forward fashion. 
Next, consider 6 V(rA8). Rewrite 


OV (CAD) = f 2 (0 di)[I(Yui $y + rxniA9)) —Hni(y)] Bn(¥) daly) 


which, in view of the assumption (12), shows that OV (rA@) is monotonic 
in r for every ||@|| = 1. Therefore, by (42) 4 No, depending on e, 3 


P( inf (0 V(rA0))"/4 > 2) 


|r| >B, || @l=1 
>P( inf (0 V(rA0))"/7n>z) 
|r|=B,||@]=1 
>P( inf (0 WrA0))*/7_ > 2) —(e/2), ¥n>Na, 
|r| =B, || @i[=1 
> 1—€e, V n> NoVN,, 
by (45). This proves (39) and hence (39). o 


The next lemma gives an analogue of the previous lemma for K.. 
Since the proof is quite similar no details will be given. 

Lemma 5.5.5. In addition to the assumptions of Theorem 5.5.2 assume 
that (11+) and (12) hold, where (11*) is the condition (11) with Ty, replaced 
by Tn := (Vi, ...., ¥p) and where {v;} are defined just above (32). 

Then, Ve > 0,0 <2z<o0,4 N (depending only on €) anda B 
(depending on €,z) 9 


(46) P( inf Kj(Au) > z) 2 1-e, V nN, 
[a||>B 


(47) P( inf Ki(Au) > z) > 1-«, Vn>N. a 
[ull>B 
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The above two lemmas verify (5.4.A5) for the two dispersions K and 
K". Also note that (40) together with (5) and (10) imply that | AT(A — fh)|| 
= O,(1), where A is defined at (49) below. Similarly, (47), (5), (35) and 
the symmetry assumption (5.3.8) about {G,} imply that JA “(At f)\| = 


Op(1), where A” is defined at (53) below. The proofs of these facts are 
exactly similar to that of (5.4.2) given in the proof of Theorem 5.4.1. 

In view of Remark 5.5.1 and Theorem 5.4.1, we have now proved the 
following theorems. 


Theorem 5.5.3. Assume that (1.1.1) holds with the modeled and actual 
d.f.’s of the errors {eni,1<i<n} equalto {Hni,1<i<n} and {Fypi,1<¢i 
<n}, respectively. In addition, suppose that (1) — (12) hold. Then 


(48) (B, - A) AY By A “(B, — A) = op(1), 

where A satisfies the equation 

(49) 3, A '(A-f) = Jn. 

Tf, in addition, 

(50) 2,' exists forn > p, 

then, 

(51) A™'(B)— B) = Bn Fa + (1), 

where Fy and By are defined at (37). D 


Theorem 5.5.4. Assume that (1.1.1) holds with the actual d.f.’s of the 
errors {€ni, 1<i<n} equal to eo 1<i<n}. In addition, suppose that 
{X, Fai, D, Gn} satesfy (1)-(4), 6) — (9), (5.3.8) for all n > 1, (11), (12) 
and (33). Then, 


(52) (A — A*) AY 83 A'(A, — A*) = op(1), 
where A” satisfies the equation 

(53) Bt A (At p= K%. 

If, in addition, 

(54) (.9%) | exists forn > p, 


then, 
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(55) A’ *(6) — B) = (Bn) Tn + 0p(1), 
where J, and By are defined at (38). O 


Remark 5.5.2. If {Fi} are symmetric about zero then mm) =0 and ff 


is consistent for # even if the errors are not identically distributed. On the 
other hand, if the errors are identically distributed, but not symmetrically, 


then 6, will be asymptotically biased. This is not surprising because here 


the symmetry, rather than the identically distributed nature of the errors is 
relevant. 


If {F;} are symmetric about an unknown common point then that 
point can be also estimated by the above m.d. method by simply augmenting 
the design matrix to include the column 1, if not present already. O 


Next we turn to the Ko and B (5.2.22) and (5.2.23). First we state 


a theorem giving an analogue of (28) for K.. Let Yj, uj be Ya, pa of 
(2.3.1) with {dni} replaced by {dni}, j= 1,.....p, Xni replaced by Yni 


and Cni = Aj(xni — Xn), 1 <i n, where A; and xX, are defined at 
(4.3.11). Set 


(56) Rj(s) : = Yi (daij — rounded Xn) Gni(s), 
where, for 1<j<p, dnj(s) :=n AS dnij 4ni(s), 04 8 <1, with {43} asin 
(3.2.35) and qnizfni(H -),1<i<n. Let 
ak Pp 1 * y) 

(57) K,(t) = Py f { Y;(s, 0) — t’Rj(s) + y;(s, 0)}° dLz(s). 
In (59) below, L in K is supposed to have been replaced by Lp. 

a 5.5.5. Let Yn, ... » Yan be independent r.v.’s with respective 
d.f.’s , Fan. Assume {D , X, Foi} satisfy (1), (2), (3), (2. ii 


2.12), (3.2.38 and (3.2.36) with wi= dy, 1<¢j<p, 1<i¢n. Let 


n 
e a sequence of d.f.’s on (0, : and assume that 


2 ~ 
(58) by f (8, 0) dLn(s) = O(1). 
Then, for every 0< B <a, 


(60) Si |K(Au) — K}(Au)| = op(1). 
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Proof. The proof of (60) uses the a.u.l. result of Theorems 3.2.1 and 
3.2.4. Details are left out as an exercise. o 


The result (60) shows that the dispersion K satisfies (5.4.A1) with 
(61) 0 =0, bn =A1', Sa=Ai' Fa, Wn =A Bn Ay 


Fn =— f° Ta(s){¥q(s) + y(s)} dLn(s), 


2,:= f 'Ta(s)I'n (s) dLn(s), 


where r,(s) = A1X,A(s)D(s), D(s) := Ks ni — dni(s) I<i<n, 1<j<p; A(s) 
> "D 


as in (2.3.32),0<s <1; X> asin (4.2.11 ty osug V5) f= (41, 
1.) Hp) with Yj(s) = Yj(s, 0), pj(s) = y;(s, 0). 

Call the condition (11) by the name of (11*) if it holds when (TL, , 
Gn) is replaced by (P., Ly). Analogous to Theorem 5.5.4 we have 


Theorem 5.5.6. Assume that (1.1.1) holds with the actual d.f.’s of the 
errors {€ni,1<i<n} equalto {Fni,1<i<n}. In addition, assume that 


{D, X, Fai} satisfy (NX ), (2), (3), (2.3.3b), (3.2.12), (3.2.35), (3.2.36) with 
wi= dy, 1<j<p, 1¢i<¢n, (11*) and (12). Let {Ly} be a sequence of 
d.f.’s on (0, 1] satisfying (58). Then 


(62) (6, — A’) A* 2, A'(B — A’) = 0,(1), 
where A satisfies the equation 

(63) 3, A (A -f)= Jp. 

If, in addition, 


(64) (.2,) 2 exists for n > p, 
then, : : : 
(65) A"(6, — B) =(#n) In + op(1), 


where Fn and By are defined at (61). 


The proof of this theorem is similar to that of Theorem 5.5.3. The 
details are left out for interested readers. See also Section 4.3. Oo 
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Remark 5.5.3. Discussion of the assumptions (1) — (10). Among the 
assumptions (1) — (10), the assumptions (7) and (9) are relatively harder to 
verify. First, we shall give some sufficient conditions that will imply (7), (9) 
and the other assumptions. Then, we shall discuss these assumptions in 
detail for three cases, v.i.z., the case when the errors are correctly modeled to 
bei.i.d. F, F a known d.f., the case when we model the errors to bei.i.d. F 
but they actually have heteroscedastic gross errors distributions, and finally, 
the case when the errors are modeled to bei.idd. F but they actually are 
heteroscedastic due to difference in scales. 

To begin with consider the following assumptions. 


(66) For any sequences of numbers {ani, bni}, ani < Dni, 
Max}; (dni ani) — 0, 


. -{ dni 9 
lim supn max; (Dni-ani) f° . f {fni(y+z)-fai(y)}’ dGn(y) dz = 0. 
(67) maxj f fai dG, = O(1). 


Claim 5.5.1. Assumptions (1) — (4), (66), (67) imply (7) and (9). 


Proof. Use the C-S inequality twice, the fact that (di;)” < d4; for all 
i, j, and (2) to obtain 


p ’ / 
2 f Bday {Fi (y + civ + di) — Fi (y + civ — dmi)}]° dGu(y) 
bj 
€ 2); dil)” f3i 5K; ® fi (y + z) dz dG,(y) 
bj 
< 4p"§’ max; (26m) f°. ffi (y+z) dGr(y)dz, (by Fubini), 


rg _ =—Kib+ CiV, bj = 6316 + civ, 1<i<n. Therefore, by (66), (67) 
and (1), 


L.h.s. (7) < 4p76°k, (k = lim supn max; |fi[ ), 


which shows that (7) holds. 
Next, by (2) and two applications of the C—S inequality 


l.hs. (9) = z Jf (Si diy {Fi (y + equ) — F;(y) — cxufi(y)}]° dGua(y) 


<p fi {Fi(y + cu) _ F;(y) = cxuf;(y)}" dG,(y) 
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= ef fi [f ay + 2) -fy))aa)? aGn(y) 
+ [35 |- Tse (fiy + 2) — fi(y))dz]? dGa(y) } 


cop {fete fo {hly +2) — f(y) a 


+ 35 (—cin) jos J {ily + 2) — fly) dz] dGa(y) 


ro plein 9 
¢ [maxi (2|ciu|)*f sai ff {fi(y+2)— fi(y)}’dGn(y)dz]- 
—|cj’u 
-4p 3s (ciu)”, 
where Yj (Zi) is the sum over those i for which ciu > 0 (cju <0). Since 
A (ju)? < pB for all uceM(B), (9) now follows from (66) and (1). o 


Now we consider the three special cases mentioned above. 


Case 5.5.1. Correctly modeled 1.1.d. errors: Fny = F = Hpi, Gn = G. 
Suppose that F has adensity f w.r.t. A. Assume that 


(68) (a) 0< ffdG<a, (b) 0< ff dG <o. 
(69) fF(1-F) dG <o. 
(70) (a) lim f ily + 2) dG(y) = ftdG 


(b) lim f(y +2) dG(y) = ff dG. 


Claim 5.5.2: Assumptions (1), (2), (4) with Gp = G, (68) — (70) 
imply (1) —(10) with Gp = G. 


This is easy to see. In fact here (5) and (6) are equivalent to (68a), 
69) and (70a); (66) and (67) are equivalent to (68b) and (70b). The LHS 
10) = 0. 


Note that if G is absolutely continuous then (68) implies (70). If G 
is purely discrete and f continuous at the points of jumps of G then (70) 
holds. In particular if G = 6, i.e., if G is degenerate at 0, m > f(0) > 0 
and f is continuous at 0 then (68), (70) are trivially satisfied. If G(y) = y, 
(68a) and (70a) are a priori satisfied while (69) is equivalent to assuming 
that Ele;—eo| <o, e, €2 iid. F. 
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If dG(y) = {F(y)(1 — Fly i dF(y), the so called Darling— 
Anderson measure, then (68) — (70 are satisfied by a class of d.f.’s that 
includes normal, logistic and double exponential distributions. 

Case 5.5.2. Heteroscedastic gross errors: Hn = F, Faiz (1—6ni)F+ oni 
Fo. We shall also assume that G, = G. Let f and fy be continuous 
densities of F and Fo. Then {Fni} have densities fnj=f{+ 6ni (fp —f), 1 
<i<n. Hence (3) is satisfied. Consider the assumption 
(71) 0< bni¢1, max; d:i— 0, 


(72) f lFo-F| dG <o. 


Claim 5.5.3. Suppose that fy and f satisfy (68) and (70), F satisfies 
tS) and suppose that (1), (2) and (4) hold. Then (71) and (72) imply (5) — 
9). 


Proof. The relation fj =f+ 6;(fo —f) implies that 
v; — Dy dij cy f = Ly daj ec; 6; (fo — f), 1<j<p, 
and 
ia — Bi [ldil!? £= Bi [ldsll” 6 (fo — 0). 
Because dll? <p, hi llei|7= p, we obtain 
2 
| f (am(y+x) — 3: IIdill” f(y + x)]dG(y)| 
< pmax; 6; | f [f(y + x) -f(y + x)]dG(y)|,  V xeR. 


Therefore, by.(71), (68a) and (70a), it follows that (6) is satisfied. Similarly, 
the inequality 


. 2 2 2 
DS le - 3s di c; {|| dG < 2p” max; 6; { ff dG + ff dG} 
ensures the satisfaction of (8). The inequality 
2 e e 
| f 3s Ildsll {Fi(1 — Fi) — F(1—F)} dG| ¢ 2p max; 6 f lFo-F| dG, 


(69), (71) and (72) imply (5). Next, 


146 MINIMUM DISTANCE ESTIMATORS 5.5 


f {fly + x) -fi(y)¥° dG(y) 
¢ (14261) f {fly + x) —f(y)}" dG(y) + 48 f {holy + x) —foly)}? aG(y). 


Note that (68b), (70b) and the continuity of f imply that 


lim f {f(y + x) — f(y)’ dG(y) = 0 


and a similar result for f). Therefore from the above inequality, (70) and 
71) we see that (66) and (67) are satisfied. By Claim 5.5.1, it follows that 
7) and (9) are satisfied. o 


Suppose that G isa finite measure. Then (F1) implies (68) — (70) 
and (72). In particular these assumptions are satisfied by all those f’s that 
have finite Fisher information. 

The assumption (10), in view of (72), amounts to requiring that 


Pp 
But 
p 9 n n , 9 
(74) (3 dighi)” = BD di 64 dc & ¢ (8 [Idi] 64)". 


This and (2) suggest a choice of 6; = pil : ||d;|| will satisfy (73). Note that 
if D = XA then |]d;l|* =x3(X X) ‘x:. 

When studying the robustness of By in the following section, 6 = 
p ‘x;(X X) ‘x; is a natural choice to use. It is an analogue of a 
contamination in the i.i.d. setup. Oo 


Case 5.5.3. Heteroscedastic scale errors: Hpi = F, Fai(y) = F(Tniy), 
Gy, =G. Let F have continuous density f. Consider the conditions 


(75) Miz Onit1; oni>0, 1<i<n; max; oni— 0. 
(76) lim f |y|+#(sy) dG(y) = f lyl+f(y) dG(y), j= 1, k = 1, 
j=0,k=1,2. 


Claim 5.5.4. Under (1), (2), (4) with Gn = G, (68) — (70), (75) and 
(76), the assumptions (5) — (9) are satisfied. 


Proof. By (41), (43), (49) and Theorems [1.4.2.1 and V.1.3.1 of 
Hajek—Si dak (op. cit.), 
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(77) lim limsup max; f (ry + x) - fly + x)? dG(y) = 0, 
lim f lily + x) - f(y)" dG(y) = 0, r= 1,2. 

Now, 

| f 3 lldsll? {Fil — Fs) — F(1 - F)} dG | 

¢ 2pmax; f | F(riy) — F(y)| dG(y) < 2p maxif ly| f(sy) dG(y) ds 

= 0(1), by (48) and (49) with j=1,r=1. 
Hence (69) implies (5). Next, 

| f m(y + x) dG(y) — 35 |ldill’ ft dG| 


£35 [ldill?ri f {1£(rily + x)) —f(y + x) [+1 f(y + x) — fly) ]}dG(y)+ 
+ max; ip f fdG. 


Therefore, in view of (48), (77) and (68) we obtain (6). Next, consider 
f {iy + x) — f(y)’ aG(y) 
< 44 f {[iri(y + x)) - f(y + x)? + [fly + x) - f(y)? + 
+ [f(riy) — (y)]°} 4G) 
trpirnb ary irs Famen ly ug wal em maaan 
3, fs - Bay ef? ac 
<p” maxi f {rif(riy) - f(y)}° dG(y) 
¢ 2p” maxi ri | f {f(riy) — f(y}? dG(y) + fF AG] = 0(1), 


p 
by (75), (70b), (77). Hence (70b) and the fact that IP dij cil? < p? 
implies (8). o 
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Here, the assumption (10) is equivalent to having 
2 
(78) 2, JE ditF(riy) — FO) dG(y) = 0(1). 


One sufficient condition for (78), besides requiring F to have density f 
satisfying 


(79) lim f (yf(sy))" 4G(y) = f (yily))’ dG(y) < a, 
is to have 
(80) ¥ of = (1). 


1=1 
One choice of {oi} satisfying (80) is ot =n '/? and the other choice is of 
x xi(X X) ‘x, 1<i¢n. 
Again, if f{ satisfies (F1), (F3) and G is a finite measure then (68) 
(70), (76) and (79) are a priori satisfied. o 


Now we shall give a set of sufficient conditions that will yield (5.4.A1) 
for the Q of (5.2.13). Since Q does not satisfy (5.3.21), the distribution of 
Q under (1.1.1) is not idependent of #. Therefore care has to be taken to 
exhibit this dependence clearly when formulating a theorem pertaining to Q. 
This of course complicates the presentation somewhat. As before with 
{Hni}, {Fni} denoting the modeled and the actual d.f.’s of {eni}, define for 


O<s<1l,yeR,teR?, 
= _7 ns , 
(81) H,(s, Y) t) = - Ani(y — Xnit), 


an ns , / 
m,(s, y) = 0 1? YS {Faily = Xnif) = Hnily os Xnif)}, 


77/2 ¥ - 
Mun(s, y) = 07° Y {I(Yni $y) — Fuily — xnif)}, 
dan(s, y) := dLn(s) dGp(y). 
Observe that 


Q(t) = f[Min(s, y) + mn(s, y) —n'/*{fin(s, y, t)-Hn(s, y, B)}” dan(s, y). 


Note that the single integral is over the set |0, 1]R. 
Assume that {Hy;i} have densities {hpi} w.r.t A and set 
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(82) Ra(s, y) = 0/7 ¥ xni bni(y — xnif), 
12 -1% ,2 : 
ha(y) :=n Ps hnily — Xnif), se(0, 1], y ER, 
Vo = AR, B in = f Vo Vy’ ddan. 
Finally define, for t € R?, 
(83) Q(t) := f [Min(s, y) + ma(s, y) + t Ra(s, y)]” dan(s, y). 


Theorem 5.5.7. Assume that (1.1.1) holds with the actual and the 
modeled d.f.’s of the errors {eni, 1<¢1< n} equalto {Fni, 1 <i<n} and 
{Hni, 1 <i <n}, respectively. In addition, assume that (1) holds, {Hyi, 1 <i < 
n} have densities {hni, 1 <i <n} wrt. A, and the following hold. 

(84) Jha|n = O(1). 


(85) WV veMB),V 5>0, 
bni , 
lim supy maxi (26hmi)” fL_, f hni(y —xniB + 2) dGn(y) de 


= lim supn maxi f hai(y 7 Xnif) dG,(y) < a, 
where ani = —Odkni — CniV, Dani = SkniCniV, Kni = ||Cnill, Cni = Axni, 1 <i< 2. 


(86) V wedB), 


fin’ [fia(s, y, B+ Au) —Hia(s, y, 6)] + wt}? dan(s, y) = o(1). 


(87) fat 3 Fas(y — nif) (1 —Fasly —xniB)) €Ga(y) = O(2). 
(88) fma(s, y) dan(s, y) = O(1). 

Then, V 0<B<o, 

(89) E sup, [Q(B + Av) — Q(Aa)] = oft) 


The details of the proof are similar to those of Theorem 5.5.1 and are 
left out as an exercise for interested readers. 
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An analogue of (51) for f# will appear in the next section as 
Theorem 5.6a.3. Its asymptotic distribution in the case when the errors are 
correctly modeled to be i.i.d. will be also discussed there. 

We shall end this section by stating analogues of some of the above 
results that will be useful when an unknown scale is also being estimated. To 
begin with, consider Ky of (5.2.24). To simplify writing, let 


(90) K°(s, u) := K,((1esn/”), Au), sR, ue RP. 
Write as := (1+ sn zy. Then from (5.2.24) and (90), 

) P oO Oo 2 
(91) KS(s, u) = 3. f {¥9(ya6, u) + sMlyas, w) —2 diy Huy) }PaGu(y) 
where H; is the df. of ei, 1<¢i<n, and where pj, Yj areas in (9) and 


(13), respectively. Writing yj(y), Yj(y) etc. for yj(y, 0), Yj(y, 0) etc., we 
obtain 


(92) Ks, w) =. {1H vas, u) - ¥I(y) + 18(ya5) — A) — y94(9) 
+ Y3(y) + w’vj(y) + syvj(y) + m(y) 
+ Hj(yas, w) — 4j(yas) — #’ vj(yas) 
+ w’[r4(yas) — ¥4(y)]} dGu(y) 
where v; is as in (8) and 
(93) vily) = 0? ¥ das faily), 1<¢jép. 
The representation (92) suggesting the following approximating candidate: 
(94) Ki(s, u) <3, f {¥} + w+ syv; + mj}° dGn. 
We now state 
Lemma 5.5.5. With yn as in(6), assume thatV |s| <b, 0<b<oa, 
(95) lim lim supn f an((1ssn/?)y+x)dGa(y) 
= lim supn f Yn(y)dGn(y) < a, 


and 
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(96) lim lim supn f |y| Yn(y+zy) dGn(y) 


= lim supn f |y|%m(y) dGu(y) < . 
Moreover, assume thatV (s, v)€[—b, b]x/(B) =:41, andV 6> 0 


p n / = 
(97) lim supn 2 Jf 2 dnij{Fnilyas +Cniv+ An 2) + Kni)) — 


- Fni(yas + Cniv - 6(n”/?| y| + rni))}]? dGa(y) 


ke, 


for some k not depending on (s, v) and 6. 
Then, V 0 <b, B <a, 


(98) E sup by f {v3((atsn/*)y, 0) — ¥9(y)}? dGu = 0(1) 


where the supremum is taken over (s, u)€M. 
Proof. For each (s, ued, with as = 1 + sn t/ 
i 2 
Ed f {¥i(vas, w) — Yi(v)¥° dGu(y) 


n bn 
cf frlraee) aGaly)de + f™ f Ivl avo) aaa) 


where By = B max; ||«;|, bh = bn ale 


. Therefore, from (95) and (96), for 
every (s, ued, 


BE f{¥3va6, w) — YAV)}? aay) = o(2). 


Now proceed as in the proof of (16), using the monotonicity of Vja(a, t), 
yj(a, u) and the compactness of J, to conclude (98). Use (97) in place = 
(7). The details are left out as an exercise. 


The proof of the following lemma is quite similar to that of (30). 


Lemma 5.5.6. Let G:i(y) = Gi(y/ar). Assume that for each fired 


(7, ujeW, (8) and (9) hold with Gy replaced by G.. Moreover, assume the 
following: 
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(99) 3, f Ori(y))? aGa(y) = 0). 
(100) Bf {uflyas) — 08) — rrF)F dGa(y) = o(1), ¥ Is] <b. 
Then, V0 <b, B <a, 
(101) sup Py J {ui(vas, w) — ui(ar y) — w’4(ar y)}” dGa(y) = 0(1), 
and 

P Oo Oo * 2 
(102) sup Ef {ui(vas) — ui(y) — ry4j(y)}" dGa(y) = o(1). 


where the supremum in (101), (102) ts taken over (s, u) €.%, |s| < b, 
respectively. 


Theorem 5.5.8. Let Yui, .... Ynn be independent r.v.’s with respective 


d.f.’s Fn, ..., Fon. Assume (1) — (5), (8), (10), (95) — 07 hing the 
conditions of Lemma 5.5.6 hold. Moreover assume that for each | 


. 2 
(103) 2, f llvi(vas) — wi(y)II" €Gn(y) = 01). 
Then, V 0<b, B<oa, 
(104) E sup [K3(7, u) — K\(r, u) = o(1). 


where the supremum is taken over (s, u)Ed. 
The proof of this theorem is quite similar to that of Theorem 5.5.1. o 


5.6. ASYMPTOTIC DISTRIBUTIONS, EFFICIENCES AND 
ROBUSTNESS 


5.6a. Asymptotic Distributions and Efficiences 


To begin with consider the Case 5.5.1 and the class of estimators {B, }. 


Recall that in this case the errors {€n;} of (1.1.1) are correctly modeled to 
be iid. F, ie., Hnj = F = Fpi. We shall also take Gy = G, GeD7(R). 
Assume that (5. 5. 68) — (5. 5.70) hold. The various quantities appearing in 
(5.5.37) and Theorem 5.5.3 now take the following simpler forms. 
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(1) Ix(y)=AX Diy), yeR, 2,=AX DD XA ftdG, 
J, =-AX Df YO £dG. 


Note that 3, will exist if and only if the rank of D is p. Note 
also that 


(2) By I, =D XA)* fo faG / (f Pac) * 
= (D XA ffdG) * 3: di [Yei) — Ex{es)], 


y 
where p(y) = f fdG, yeR. 
oO 


Because Gy = G € DZ(R), there always exists a geL?(G) such that g 
> 0, and 0 < fg2dG < om. Take gy, =g in (5.5.11). Then the condition 
(5.5.11) translates to assuming that 


(3) lim inf, Ani \0 D XAO| >a for some a > 0. 
Oi\=1 


Condition (5.5.12) implies that 6D XAQ> 0 or OD XA 6<0, V 
|@| = 1 and V n> 1. It need not imply (3). The above discussion together 
with the L-F Cramer-Wold Theorem leads to 

Corollary 5.6a.1. Assume that (1.1.1) holds with the error r.v.’s 


correctly modeled to be i.i.d. F, F known. In addition, assume that (5.5.1), 
(5.5.2), (5.5.12), (5.5.68) — (5.5.70), (3) and (4) hold, where 


(4) (D XA)! exists for all n> p. 
Then, 
(5) A\(A)-f)=(D'XA fPaG)* 3 dai [Weni) — Ex(ens)] + op(1). 


If, in addition, we assume 


(6) max [ldaill” = o(1), 

then 

(7) BA (A, — A) — N(0, r'lpxp) 
where 


%:=(D XA) 'D D(AX D)*, 72= Var We)/(ffdG). o 
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For any two square matrices L; and Ly of the same order, by 
L; > Lz we mean that L,— Lz. is non—negative definite. Let L and J be 


two pxn matrices such that (LL)? exists. The C-S inequality for 
matrices states that 


(8) JJ >JL (LL ) ‘LJ with equality if and only if J« L. 
Now note that if D= XA then %& = Ipw. In general, upon choosing 
J=D ,L=AX_ in(8), we obtain 
DD2DXA-AXD or ¥>Ibp 
with equality if and only if D« XA. From these observations we deduce 


Theorem 5.6a.1. (Optimality of B,). Suppose that (1.1.1) holds with 


the error r.v.’s correctly modeled to be 1.1.d. F. In addition, assume that 
(5.5.1), (5.5.4) with Gy = G, (5.5.68) — (5.5.70) hold. Then, among the class 


of estimators {B.; D satisfying (5.5.2), (5.5.12), (3), (4) and (5)}, the 
estimator that minimizes the asymptotic variance of bA (A, — B), for 
every beR?, is B, — the B, with D= XA. o 


Observe that under (5.5.1), D = XA a priori satisfies (5.5.2), (3), (4) 
and (6). Consequently we obtain 


Corollary 5.6a.2. (Asymptotic normalty of B,.) Assume that (1.1.1) 
holds with the error r.v.’s correctly modeled to be 1.1.d. F. In addition, 
assume that (5.5.1) and (5.5.68) — (5.5.70) hold. Then, 

-]1 a 

Remark 5.6a.1. Write B,(G) for A, to emphasize the dependence 
on G. The above theorem proves the optimality of B,(G) among a Class of 
estimators {A,(G), as D varies}. To obtain an asymptotically efficient 


estimator at a given F among the class of estimators {8,(G), G varies} 


one must have F and @G satisfy the following relation. Assume that F 
satisfies (3.2.a) of Theorem 3.2.3 and all of the derivatives that occur below 
make sense and that (5.5.68) hold. Then,a G that will give asymptotically 


efficient B,(G) must satisfy the relation 
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-f dG = (1/I(f))-d(£/t), K(f) := f (£/4)aF. 


From this it follows that the m.d. estimators B,(G), for G satisfying the 


relations dG(y) = roe and dG(y) = 4dé(y), are asymptotically 
efficient at logistic and double exponential error d.f.’s, respectively. 


For B,(G) to be asymptotically efficient at N(0, 1) errors, G 


would have to satisfy f(y)dG(y) = dy. But such a G does not satisfy 
(5.5.58). Consequently, under the current art of affairs, one can not estimate 


fB asymptotically efficiently at the N(0, 1) error df. by using a B,(G). 
This naturally leaves one open problem, v.i.z., Is the conclusion of Corollary 
5.6a.2 true without requiring {fdG <a,0< {fd <0? D 


Observe that Theorem 5.6a.1 does not include the estimator f, — the 
6, when D = nly 0, ..., Olnxp i-e., the m.d. estimator defined at (5.2.4), 


(5.2.5) after Hn; is replaced by F in there. The main reason for this being 
that the given D does not satisfy (4). However, Theorem 5.5.3 is general 
enough to cover this case also. Upon specializing that theorem and applying 
(5.5.49) one obtains the following 


Theorem 5.6a.2. Assume that (1.1.1) holds with the errors correcit 
modeled to be t.i.d. F. In addition, assume that (5.5.1), (5.5.68) — (5.5.70 
and the following hold. 


(10) Either 
n/?9.x,;A0>0 forall 1<i<n, all ||| =1, 
or 
n/29.x,3A0< 0 forall 1<i¢n, all [A] =1. 


(11) lim inf, inf |n/?9x,A0| > a>0, 
| l=1 
where X, is as in(4.2a.11) and 0, 1s the first coordinate of 6. Then 
(12) nlx, A+A "(Bi — A) = Zn / [PAG + op(1), 
where 
Z, =n 1/23; {Weni) —Ev(eni)}, with p as in (2). 


Consequently, nt/ *=,( Bi — B) is asymptotically a N(0, 72) r.v. O 


156 MINIMUM DISTANCE ESTIMATORS 5.6a 


Next, we focus on the class of estimators {f}} and the case of i.i.d. 


symmetric errors. An analogue of Corollary 5.6a.1 is obtained with the help 
of Theorem 5.5.4 instead of Theorem 5.5.3 and is given in Corollary 5.6a.3. 
The details of its proof are similar to those of Corollary 5.6a.1. 


Corollary 5.6a.3. Assume that (1.1.1) holds with the errors correctly 
modeled to be 1.1.d. symmetric around 0. In addition, assume that (5.3.8), 


(5.5.1), (5.5.2), (5.5.4) with Gp = G, (5.5.68), (5.5.70), (3), (4) and (13) hold, 
where 


(13) fG-F)dG<a 
Then, 
(14) A (6) -f) =-{2AX D f{f'aG}™. f w'(y) f(y)dG(y) + 09(1), 


where f"(y) := f(y) + f(-y) and W(y) is W'(y, 0) of (5.5.32). If in 
addition, (6) holds, then 


-1,-1 
(15) x, A (fj -8) — N(0, 7Ip xp). O 
Consequently, an analogue of Theorem 5.6a.1 holds for A, also and 


Remark 5.6a.1 applies equally to the class of estimators {A,(G), G varies}, 
assuming that the errors are symmetric around 0. We leave it to interested 
readers to state and prove an analogue of Theorem 5.6a.2 for fj. 

* 

Now consider the class of estimators {§,} of (5.2.23). Recall the 
notation in (5.5.61) and Theorem 5.5.6. The distributions of these estimators 
will be discussed when the errors in (1.1.1) are correctly modeled to be i.i.d. 
F, F an arbitrary d.f. and when Ly = L. In this case various entities of 
Theorem 5.5.6 acquire the following forms. 

Hy = 9; fni(s) = 1; D(s) = D, under (5.2.21); 
* , _ 
P'n(8) = ArXeD a(s), a= fF); 
x , , n 
Jn = —AsXcD fy adb = AiXcD ¥ dni vo(F(eni); 


Bn =(AiXeD D XeAy) f q°dl, 


where X, and Ay, are defined at (4.2a.11) and where 


you) := f q(s)dL(s), 0<u<l. 
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Arguing as for Corollary 5.6a.1, one obtains the following 


Corollary 5.6a.4. Assume that (1.1.1) holds with the errors correctly 
modeled to be 1.i.d. F and that L isad.f.. In addition, assume that (F1), 
(NX,), (5.2.21), (5.5.2), and the following hold. 


(16) lim inf, a |0 D XA10| > a> 0 
d\= 
(17) Fither 
6 dui(Xni—Xn) A1O>0, W1<i<n, V |] =1, 
Or 
0 doi(Xni-Xn) AOS 0, V1s<i¢n, V Al =1. 
(18) (D X,A;) * exists for all n > p. 
Then, 
-1) gt 1247-1 2 
(19) Ai"(B, — 8) =(D XcAy i, qd)" ¥ dni o(F(€ni)) + op(1). 
If, in addition, (6) holds, then 
*)-1 a nly ae 2 
(20) (AC — 6) — N(QO, o¥lpe) 
where i =(D X,A1) “D D(AiX,D)‘, 0% = Var 9 F(e,))/( f ‘q2dL)’. 
Consequently, 
= * 
(21) Ai'(By —B) 3 N(0, oolpxp) 
and {A., } is asymptotically efficient among all {f., D satisfying above 
C 
conditions}. O 


Consider the case when L(s) =s. Then 


2 -2 2 2 
oo = (ff (x)dx)” ff [F(xay) — F(x)F(y)]P (x) (y) dxdy. 
It is interesting to make a numerical comparison of this variance with 


that of some other well celebrated estimators. Let ae. Oraa, 0% and ee 
denote the respective asymptotic variances of the Wilcoxon rank, the least 
absolute deviation, the least square and the normal scores estimators of £. 
Recall, either from Chapter 4 or from Jaeckel (1972) that 
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Pa = (1/12)-{ f £°(x) dx} ?; oxad = (2 £(0))?; o%, = a”: 


2 -1 -2 
ons = { f P(x)/o( “(F))) dx} 
where o” is the error variance. Using these we obtain the following table. 


Table I 


2 2 2 2 
Ow l 


Double Exp. 1.333 1 n/[2 2 
Logistic 4 is ne [3 
Normal t/3 1/2 1 1 


It thus follows that the m.d. estimator p, (L), with L(s) =, is 


superior to the Wilcoxon rank estimator and the l.a.d. estimator at double 
exponential and logistic errors, respectively. At normal errors, it has smaller 
variance than the l.a.d. estimator and compares favorably with the optimal 


estimator. The same is true for the m.d. estimator B,(F). 


Next, we shall discuss f. In the following theorem the framework is 
the same as in Theorem 5.5.7. Also see (5.5.82) for the definitions of 1, 


Brn etc. 


Theorem 5.6a.3. In addition to the assumptions of Theorem 5.5.7 


assume that 

(22) lim inf, inf | fF dan4| > a, for some a> 0. 
4 lo} =1 

Moreover, assume that (10) holds and that 

(23) 2B, exists for all n> p. 

Then, 


(24) A“(B- f) =-Ba'f fHa(s, y){ Mn(s, y) + mn(s, y)} don(s, y) + 


+ Op(1). 
Proof. The proof of (23) is similar to that of (5.5.51), hence no details 
are given. O 


Corollary 5.6a.5. Suppose that the conditions of Theorem 5.6a.3 are 
satisfied by Fni= F =Hni, Gn = G, Ln ZL, where F 1s supposed to have 
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continuous density f. Let 
1 pl _47 ns nt , 
(25) C=Sff fan’ 2% xixjfi(y)ii(y) A} (sat)- 
-{F(yAz) — F(y)F(z)}] da(s, y) da(t, z), 
where fi(y) = f(y —xif), and da(s, y) = dL(s)dG(y). Then the asymptotic 
distribution of A ‘(B—) is N(0,%o(f)) where Yo(f) = Bat'CBa’. o 


Because of the dependence of %» on ff, no clear cut comparison 
between f and B, in terms of their asymptotic covariance matrices seems 


to be feasible. However, some comparison at a given f# can be made. To 
demonstrate this, consider the case when L(s) = 5s, p =1 and f, = 0. 
Write x; for xi; etc. 


n 
Note that here, with r. = ey x4, 
1= 
7 -~9 1 ~| ns ns 
Bn = Tx fon 22 xj ds ff de, 
_ -9 1 1 -j ns nt 
C=; f J n 2, x1 2 yj (sat) dsdt 


-f f(E(yA2) — F(y)F(@)] dy) dy(2). 
Consequently 
~9 1 pl -1 ns nt 
Tx f f (s A tn Y xi ¥ xj dadt 
¥(0) = —— Tf? = Ip‘ T?, Say. 
-9 1 _, 28 ns . 9 
(Tx : n 2, sid x: ds) 
Recall that 7? is the asymptotic variance of 1,;(f, — f). Direct 
integration shows that in the cases xj=1 and xj=i,I%m — 18/15 and 


50/21, respectively. Thus, in the cases of the one sample location model and 
the first degree polynomial through the origin, in terms of the asymptotic 


variance, §, dominates B with L(s)=s at B=0. o 
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5.6b. Robustness 

In a linear regression setup an estimator needs to be robust against 
departures in the assumed design variables and the error distributions. As 
seen in Section 5.6a, one purpose of having general weights D in B, was to 


prove that B, is asymptotically efficient among a certain class of m.d. 


estimators {B,, D varies}. Another purpose is to robustify these estimators 


ae the extremes in the design by choosing D to be a bounded function 
of X that satisfies all other conditions of Theorem 5.6a.1. Then the 


corresponding B, would be asymptotically normal and robust against the 


extremes in the design, but not as efficient as B,. This gives another 
example of the phenomenon that compromises efficency in return for 
robustness. A similar remark applies to {f,} and {Bi}. 

We shall now focus on the qualitative robustness (see Definition 4.4.1) 
of B, and By. For simplicity, we shall write B, f’,for B is in the rest 
of the section. To begin with consider f. Recall Theorem 5.5.3 and the 
notation of (5.5.37). We need to apply these to the case when the errors in 


(1.1.1) are modeled to bei.id. F, but their actual d.f.’s are {Fni}, D= XA 
and Gy, =G. Then various quantities in (5.5.37) acquire the following form. 


(1) aly) =AX A (y)XA, By = AX f A'IIA' dG XA, 


Fa = fTn(y)AX [on(y) + An(y)] dG(y) = Zn + ba, say, 


where 

(2) M:= X(X X)"X; ba = fTa(y)AX An(y) dG(y); 
Qni(y) = T(enj < y) = Fni(y), 
Ani(y) := Fni(y) — F(y), 1<i¢n, yeR; 


4 4 
an >= (Qn1, Qn2; ceeey Onn); An — (Ant, Ana, cee Ann). 


The assumption (5.2.1) ensures that the design matrix X_ is of the 


full rank p. This in turn implies the existence of 2;) and the satisfaction 
of (5.2.2), (5.2.12) in the present case. Moreover, because Gy =G, (5.2.11) 
now becomes 
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Robustness 
(3) lim inf, Va. k,(9)> 7, forsome y> 0, 
where - 
kn(0) := 0 AX’ fA" gdG XA |] = 1, 


and where g is a function from R to [0, a], 0 < fg'dG<o, r=1,2. 
Because G is a o—finite measure, sucha g always exists. 
Upon specializing Theorem 5.5.3 to the present case, we readily obtain 


Corollary 5.6b.1. Assume that in (1.1.1) the actual and modeled d.f.’s 
of the errors {eni, 1<i<n} are {Fpi, 1<i<n} and F, respectively. In 
addition, assume that (5.5.1), (5.5.3) — (5.5.10) with D= XA, Hy; = F, 
Gn=G, and(3) hold. Then 


(4) A *(B—B) = —35'{Zn + bn} + 0(1). ' 


Observe that 2,'da measures the amount of the asymptotic bias in 
the estimator Bp when F,;#F. Our goal here is to obtain the asymptotic 


distribution of A‘(B — B) when {Fy;i} converge to F in a certain sense. 
The achievement of this goal is facilitated by the following lemma. Recall 
that for any square matirx L, ILI. — sup{ lit’ Lil |t|] < 1}. Also recall the 


fact that 
(5) LI, < {tr-Lb }?, 
where tr. denotes the trace operator. 
Lemma 5.6b.1. Let F and G satisfy (5.5.68). Assume that (5.5.5) 


and (5.5.10) are satisfied by Gn = G, {Fni}, Hni = F and D = XA. 
Moreover assume that (5.5.3) holds and that 


(6) pn := f (3: [lxniAll” |fni—£])"dG = o(1). 
Then with I = Ipxp, 

(i) | Gn -If fdGI|_, = 0(1). 

(ii) || Bn" -1( f dG) “II, = o(1). 


(iii) ltr. 2n—p ffdG| = o(1). 
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(iv) 3, f Iel’as —p f Pac] = of). 
(v) Ibn — f AX An(y)f(y)dG(y)|] = 0(1). 
(vi) Zn — f AX an(y)f(y)dG(y)|| = of (1). 
(vii) SH, |kn(@) — ff gdG] = o(1). 


Remark 5.6b.1. Note that the condition (5.5.10) with D = XA, 
Gn = G now becomes 


(7) SAX Aall’dG = O(1). 


Proof. To begin with, because AX XA= I, we obtain the relation 
, / * , * 
Pn(y)Pn(y) —f(y)E = AX [A (y) —f(y)IJXA-AX [A (y) — f(y)I]XA 
= AX C(y)XA-AX C(y)XA 
= Hy)D (y), yeR, 


where C(y) := A'(y) —I f(y), Hy) := AX C(y)XA, yé€R. Therefore, 
(8) Wn Tf Pac, < sup Se Hy)P (vl Aas) < fftLL ac 
where L= DD. Note that, by the C—S inequality, 


(9) tr. LL = tr. DDD D< {tr.DD }’. 
Let 6;=f;-f, 1<i<n. Then 


(10) |tr.DD | = |tr. 2; 3; Axi xjA-AxjxjA - 6;6)| 
= [85 3 6:6) (xj AAxi)"| 
€ D4 D5 | 66] + [i All? + [xp All? 
= (3; ||Axill?| 6)? = po. 
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Consequently, from (8) — (10), 
2 
(11) | Gn -If PdGl, < f (2: ||Axill”|fi—£])°dG = o(1), by (6). 


This proves (i) while (ii) follows from (i) by using the determinant and 
cofactor formula for the inverses. 
Next, (iii) follows from (6) and the fact that 


(12) ltr. @n—pffdG| =| ftrID dG] < pr, by (10). 
To prove (iv), note that with D= XA, 

b f |4l|"aG =) y f xjAAx; x, AAx; fi(y)fx(y) dG(y). 

jet izik=1 
Note that the r.h.s. is p f f’dG inthe case fj=f. Thus 


p y) / 
(13) JE, fll’ dG—p frac] = | ftr-I9 dG| < pp. 


This and (6) proves (iv). 
Similarly, with d; (y) denoting the jth row of Dy),1< j<p, 


Ibn — f AX’ A, fdGl|? = || f BAX A dG||’ 


= 3. 1f di(g)AX’ An(y) acy) 


(14) < pn f AX An(y)||? dG(y) 

and 

(15) Zn — f AX an(y) f(y)dG(y)I < pn f||AX an||"aG. 
Moreover, 

(16) Ef AX anl|? dG = [3s |[xiAl|” Fi(1 — Fi) dG. 


Consequently, (v) follows from (6), (7) and (14) whereas (vi) follows from 
(5.5.5), (6), (15) and (16). Finally, with p/? = ax’c'/?, v 9, 
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}kn(6) — fi gdG| =| fDgaG 0] = fo 0" \)? gac. 


Therefore, 
sue Vea(@) — ff eaG] ¢ f(% Axil lf —f1} ga 
‘pn {fg2dG}/? =o0(1), by (6). a 


Corollary 5.6b.2. Assume that (1.1.1) holds with the actual and the 
modeled d.f’s of {eni, 1<¢i< n} equalto {Fni, 1<i<n} and F 
respectively. In addition, assume that (5.5.1), (5.5.3) — (5.5.7), (5.5.9) 
(5.5.10) with D= XA, Hpi =F, Gn = G; (5.5.68) and (6) hold. 

Then, (5.5.8) and (2) are satisfied and 


? 
] 


(17) A'(B-B) =—(ffdG)* {Zn + ba} + op(1) 


where 

fin := f AX an(y) dY{y) = AS: xni [Heni) — f Wx)dFai(x)], 

bp := f AX A,(y) dwWy) = f Yy Axni [Fni— F] dy, 

with as in (5.6a.2). O 

Consider Z,. Note that with ons = Var{y(eni)|Fni}, 1 <i <n, 

E Dntn = y; Ax,:x,;A- ae 
One can rewrite 
obi = ff [Fuilxty) —Fai(x)Fai(y)] dY(x)dy), 1 ign. 

By (5.5.68a), » is nondecreasing and bounded. Hence max; ||Fni—F||, — 0 


readily implies that max; io, = Var{y(e)|F}. Moreover, we 
have the inequality 


|EZnZn — O° Ipw| < ¥ |Axaill” | oni — 0° |. 
It thus readily follows from the L-F CLT that (5.5.1) implies that Ts = 


N(0, a’l, sp), if max; ||/Fni— F|| — 0. Consequently, we have 
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Robustness 


Theorem 5.6b.1. (Qualitative Robustness). Assume the same setup 
and conditions as in Corollary 5.6b.2. In addition, suppose that 


(18) max; |[Fni—F||_ = 0(1), 
(19) WAI], = 0(1). 
a n 
Then, the distribution of B under AT Pai converges weakly to the degenerate 
1= 
distribution, degenerate at B. 


Proof. It suffices to show that the asymptotic bias is bounded. To 
that effect we have the inequality 


2 “1p 42 “AZ 
I ffaa)™ ball’ < f AX All’ dG <o, by (7). 


From this, (17), and the above discussion about {Zn}, we obtain that VY > 
0 4d Ky such that P.(Ey) —+ 1, where P2 denotes the probability under 


I Fai and Ey = {a (6 — B)|| < Kn}. Theorem now follows from this and 
1= 
the elementary inequality ||B— All < ||All_ lA “(2-A)ll. o 


Remark 5.6b.2. The conditions (6) and (18) together need not imply 
(5.5.7), (5.5.9) and (5.5.10). The condition (5.5.10) is heavily dependent on 
the rate of convergence in (18). Note that 


, 2 . “ANZ 2 “Ae 
(20) ball” < min{ Ya) AX All’dy, (ffdG) [AX All’dG}. 
This inequality shows that because of (5.5.68), it is possible to have 
Ibn || = O(1) even if (7) (or (5.5.10) with D = XA) may not be satisfied. 


However, our general theory requires (7) any way. 
Now, with y= wy orG, 


(21) [AX Al’dy = [34 3; xiAAxj Aj Aj dy 
2 
< f (54 ]Axill | Ail)"dy. 
Thus, if 


(22) B; [[Axill [Fi(y) — F(y)| < k An(y), yeR, 


* 
where k is aconstant and A, is a function such that 
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(23) lim supa f (An)’dy <a, 


then (7) would be satisfied and in view of (20), ||by|| = O(1). 

Inequality (22) clearly shows that not every sequence {Fy;} 
satisfying (6), (18) and (5.5.3) — — with D = XA will satisfy (7). The 
rate at which Fy; F is crucial for the validity of (7) or (22). o 


We now discuss two interesting examples. 


Example 5.6b.1. Fay = (1 — 6ni)F + bni Fo, 1<i< n. This is the 
Case 5.5.2. From the Claim 5.5.3, (5.5.5) — (5.5.9) are satisfied by this 
model as long as (5.5.68) — (5.5.70) and (5.5.1) hold. To see if (6) and (7) 
are satisfied, note that here 


pn = f (3s ||Axil|” 6: [f—-foll)"dG < 2 max; 6 p* - [ f(f" + fo)dG}, 
and 
Yj [Axil] |Fi—F] = 24 |] Axi[]6; [F — Fol. 
Consequently, here (6) is implied by (5.5.68) for (f, G), (fo, G) and by 
(5.5.7 1), while (7) follows from (5.5.72), (21)-{23) upon taking 


* 


Ay = |F — Fo|, provided we additionally assume that 
(24) Yi |] Axi] 6: = O(1). 

There are two obvious choices of {6;} that satisfy (24). They are: 
(25) (a) Oi=n/? or (b) &i=p/?|Axil, 1¢6i¢a. 


The gross error models with {6} given by (25b) are more natural 
than those given by (25a) to linear regression models with unbounded 
designs. We suggest that in these models, a proportion of contamination one 


can allow for the ith observation is p i/ 2 Ax;l]. If 6; is larger than this in 
the sense that ¥; ||Ax;||6; 7 o then the bias of # blows up. 


Note that if G is a finite measure, f uniformly continuous and oi} 
are given by (25b) then all the conditions of the above theorem are satisfied 
by the above {F;} and F. Thus we have 


Corollary 5.6b.3. Every BB corresponding to a finite measure G 1s 
qualitatively robust for B against hetroscadastic gross errors at all those F's 
which have uniformly continuous densities provided {6} are given by (25b) 
and provided (5.5.1) and (19) hold. O 
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Example 5.6b.2. Here we consider {Fyni} given in the Case 5.5.3. 
We leave it to the reader to verify that one choice of {oni} that implies (7) 
is to take 


(26) Oni = ||Axnill, 1<i¢n. 


One can also verify that in this case, (5.5.68) — (5.5.70), (5.5.75) and (5.5.76) 
entail the satisfaction of all the conditions of Theorem 5.6b.1. Again, the 
following corollary holds. 

Corollary 5.6b.4. Every ff corresponding to a finite measure G is 
qualitatively robust for $B against hetroscedastic scale errors at all those F's 
which have uniformly continuous densities provided {oni} are given by (26) 
and provided (5.5.1) and (19) hold. O 


As an example of a o—finite G with G(R) =o that yields a robust 
estimator, consider G(y) = (2/3)y. Assume that the following hold. 


(i) F, Fo have continuous densities f, fp; 0 < f fd), f fi dA <o. 


(ii) fFQ-F)d\<o. (iii) f \F-Foldd <o. 


Then the corresponding f is qualitatively robust at F against the 
heteroscedastic gross errors of Example 5.6b.1 with {6n;} given by (25b). 


Recall, from Remark 5.6a.1, that this B is also asymptotically 


efficient at logistic errors. Thus we have a m.d. estimator f that is 
asymptotically efficient and qualitatively robust at logistic error d.f. against 
the above gross errors models!! 


We leave it to an interested reader to obtain analogues of the above 


results for 6 and B. The reader will find Theorems 5.5.4 and 5.5.6 useful 
here. o 


5.6c Locally Asymptotically Minimax Property 


In this subsection we shall show that the class of m.d. estimators {f"} are 
locally asymptotically minimax (l.a.m.) in the Hajek — Le Cam sense (Hajek 
(1972), Le Cam (1972)). In order to achieve this goal we need to recall an 
inequality from Beran Sapa that gives a lower bound on the local 
asymptotic minimax risk for estimators of Hellinger differentiable functionals 
on the class of product probability measures. Accordingly, let Qni, Pni be 
probability measures on (R,.2), Uni, Yni be a o—finite measures on (R, 2) 
with vp; dominating Qni, Pni; dni:= dQni/dvni, Pni:= dPni/dvpy; 1<i<n. 
Let Q2 = Qnix....xQnn and P2 = Py:x....xPnn and II" denote the class of 
all n—fold product probability measures {Q™} on (R2, .22). 


168 MINIMUM DISTANCE ESTIMATORS 5.6¢ 


Define, fora c >0 and for sequences 0 < mn; 0, 0 < mn2— 0, 
F% (P4,c) = {Qa € IIa; yy f (qif? = pie) duns < c}, 


J (P2,C,7n) -_ {Qnella; Qre H% ,(P2,c), max; f (qni as Pni)“dpini © Nn1, 


max; f (ani? pie) duns < M2}; 


where )’ := (nn1, Nn2)- 


DEFINITION 5.6c.1._ A sequence of vector valued functionals {S,: 


I[n 5 R?, n > 1} is Hellinger—(H—) differentiable at {P= € 1In} if there exists 
a triangular array of pxl random vectors {€i, 1<1i<n} and a sequence of 
pxp matrices {Ay, n> 1} having the following properties: 


(i) f EnidPp; = 0, f lénill"dPni<o, 1di<n; f €ni€ni’ APni = Ipxp. 
(ii) For every 0 <c <o, every sequence mn — 0, 
sup|| An{Sn(Q) — Sn(P2)} — 2 Sif ni pai (ani? — pai”) dvnill = o(1) 
where the supremum is over all Q" €.%(P2,c,mn). 
(iii) For every « >0 andevery a@€R?, with |lal] = 1, 
Bi f(a’ Eni)I(| a” Eni] > €) dPni = 0(1), 


Now, let Xn, .... Xnn be independent r.v.’s with Qn, ..., Qnn 


denoting their respective distributions and S, = S.(Xnt, ..., Xnn) be an 
estimator of S,(Q"). Let # be a nondecreasing bounded function on (0, o| 


to [0,m) and define the risk of estimating S$, by Sn to be 
(1) Rn(Sn, Q2) = En{ %¢(||An{Sn — Sa(Q")}I}, 


where E® is the expectation under Q2. 


Theorem 5.6c.1. Suppose that {S,: 2 +R’, n> 1} is a sequence of 
H—differentiable functionals and that the sequence {P2cII®} is such that 


(2) maxi f pai di = O(1). 


Then, 
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(3) lim lim inf, inf sup Ra(Sn, Q2) > E (IZ 
C+ 9 S. Qe .%,(P2,¢, 9) ({|2l|) 


where Z isa N(0, Ipxp) 1. ¥. 


Sketch of a proof. This is a reformulation of a result of Beran (1982), 
pp 425-426. He actually proved (3) with %(P2,c,7%m) replaced by 
#,(P2,c) and without requiring (2). The assumption (2) is an assumption 
on the fixed sequence {P2} of probability measures. Beran’s proof proceeds 
as follows: 

Under (i) and (iii), there exists a sequence of probability measures 
{Q(h)} such that for every 0 <b <a, 


" ie f{ah4?(h) — ph? — (1/2) by ni pM? }?dvni = 0(1). 
Consequently, 
(5) ee Bi f {ani?(h) — pal?}'doni = 4D”, 

< 


and for n sufficiently large, the family {Qn(h), ||h|] <b, heR?} is a subset 
of %,(P1,(b/2)). Hence, V c > 0, V sequence of statistics {Sy}, 


(6) liminf,yinf, sup Rn(Sn,Q2) 
Sn QneE Sb n(P2,c) 


> lim inf, inf, sup Rp(Sp,Qn(h)). 
Sn |{h|| <2c 


Then the proof proceeds as in Hajek — Le Cam setup for the parametric 
family {Qn(h), ||h|| < b}, under the l.a.n. property of the family {Qx(h), 
[hl] < b} with b = 2c, which is implied by (4). 

Thus (3) would be proved if we verify (6) with %,(P2,c) replaced by 
J y(P,c,m), under the additional assumption (2). That is, we have to show 
that there exist sequences 0 < 7p; 0, 0 < mn2— 0 such that the above 
family {Q2(h), ||h < b} is a subset of TN ee for sufficiently 
large n. To that effect we recall the family {Q™(h)} from Beran. With ni 
as in (i) — (iii), let &n3j denote the jth component of i, 1<¢j<p,1¢i¢n. 
By (iii) there exist a sequence €n > 0, €n | 0 such that 


2 
Nace y ff én4 I(| €nij] > €n) dPni = (1). 


Now, define 
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* * * 
€naj = Eni Wl Eny| < en), Ena = naj — f ny dPni, 1<jK<p, 


oni = (Enit, oeey enip)’; 1<1¢n. 
Note that 


(7) Enill < 2pen, f SnidP ni = 0, 1<i¢n. 
Fora 0<b<a, |hl| <b, 1 <i <n, define 


Qni(h) = (1 + h’ €ni)Pni, €n < (2bp) *, 
= Pni, En 2 (2bp) -. 


Because of (7), af [hl| < b, 1 <i <n} are probability density 


functions. Let {Qni(h Pt <b, 1<i <n} denote the corresponding 
probability measures and Q2(h) = Qn)(h)x....xQnn(h). 


Now, note that for ||h|| < b,1<i<n, 
f (ani(h) — pni)” duns = 0, én 2 (2bp) 


= f(b’ Eni)’pnidyni, én < (2bp) ~. 
Consequently, since €n | 0, en < (2bp) * eventually, and 


fap, ax f (dni(h) — pai)” dyini § (2p en)” b” maxi f pai dyin =: Mt 


Similarly, for a sufficiently large n, 


nie max; f (qi4?(h) — pif) dvni < 2bpen =: Nn2, say. 


Because of (2) and because en | 0, max{7n1, Mn2} — 0. 

Consequently, for every b> 0 and for n sufficiently large, {Q2(h), 
|h|| < b} is a subset of (P2,(b/2),m,) with the above m1, M2 and an 
analogue of (6) with #,(P2,c) replaced by .%(P2,(b/2),m) holds. The 
rest is the same as in Beran. Oo 


We shall now show that ff achieves the lower bound in (3). Fix a 


Be R and consider the model (1.1.1). As before, let Fy; be the actual df. 
of eni, 1 < i <n, and suppose we model the errors to bei.i.d F, F symmetric 
around zero. Thed.f. F need not be known. Then the actual and the 
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modeled d.f. of Yni of (1.1.1) is Fni(- — xnif), F(- — xn), respectively. 
In Theorem 5.6c.1 take Xpj=Yni and {Qni, Pni, Mi} as follows: 


(8) QFi(Yni < -) = Fni(- —xni), PBi(Yni < -) = F(- —xnif), 
phi(-) = G(- —xnif), Miz dA, 16ikn. 


Also, let Q5 = QF x bs xQh, PG = pF x ei xPF The absence of # from 


the sub— or the super— script of a probability measure indicates that the 
measure is being evaluated at #=0. Thus, for example we write Q2 for 


Qn (= fh Fni) and P"™ for P%, etc. Also for an integrable function g write 
1: 


fg for fg di. 
Let fni, £ denote the respective densities of Fyi, F, w.r.t. A. Then 


qf(-) = fail: — xn3f), pe,(-) = f(- — xnif) and, because of the translation 
invariance of the Lebesgue measure, 


(9) Hn(PRc) = {Qa el"; Bs f {(ahi)/? — (hiy'7Y’ < c?} 
= {Qrel 3s f (i? -£/7)? ¢ 7} = Ha (Pac). 
That is the set # n(PAC) does not depend on ff. Similarly, 
Jo ( PHC, Tn) = {Qrell"; Qre Hn(P",c), maxi f (fni—f)” dG < ms 
maxi f (f4’—f/7)"« nao} = %a(P2,c,0n). 


Next we need to define the relevant functionals. For teR?, yeR, 1<i<n, 
define 


(10) mzi(y, t) = Fai(y + xni(t — A)) -—1 + Fai(—y + xni(t — 9), 
bn(y, t) := i Axni mni(y, ¢), 
Hn(t, Q3) = pn(t, F):= f |lba(y, tI" 4G(y), 
F’ := (Fy, «++, Fun). 


Now, recall the definition of ~ from (5.6a.2) and let T,(f, Q)) = T,(8, F) 
be defined by the relation 
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(11) Tn(B, F) = B+ (K Xf fdG)* [Ss xnfFai(y) —1 + Fai(-y)] vy). 
Note that, with by(y) = bay, §), 
(12) A *(Tn(, F) — 8) = (ffdG)* fba(y) dy). 


Some times we shall write T,(F) for T,(f, F). 
Observe that if {Fyi} are symmetric around 0, then T,(f, F) = B 


= T,(f, PY: In general, the quantity A1(T,(F) — f) measures the 


asymptotic bias in f° due to the asymmetry of the errors. 
We shall prove the l.a.m. property of 6 by showing that T, is H- 


differentiable and that ff is an estimator of T, that achieves the lower 
bound in (3). To that effect we first state a lemma. Its proof follows from 
Theorem 5.5.4 in the same fashion as that of Lemma 5.6b.1 and Corollary 
5.6b.2 from Theorem 5.5.3. Observe that the conditions (5.5.35) and 
(5.5.11+) with D = XA, respectively, become 


2 
(13) f \Ibn(y)II" 4G(y) = O(3), 
(14) lim inf, He 9 AX fA" gdGXAO2a, foran a>0, 
O\\=1 


where A” is defined at (5.5.38) and g is as in (5.6a.3). 


Lemma 5.6c.1. Assume that (1.1.1) holds with the actual d.f.’s of {eni, 
1<i<n} equalto {Fpi, 1<¢i< - and suppose that we model the errors to 
be i.i.d F, F symmetric around zero. In addition, assume that (5.3.8); 
5.5.1), (5.5.3), (5.5.4), (5.5.6), (5.5.7), (5.5.9) with D = XA, Ga = G; 
5.5.68), (5.6a.13), (5.6b.6) and (13) hold. Then (5.5.8) and its variant where 
the argument y in the integrand 1s replaced by —y, (5.5.33), (14) and the 
following hold. 


(15) A ‘(6 —T,(F)) =— {2ff dG}~ Zn + op(1), under {Q"}. 
where 

(16) Zn = Yi Axni {Y(-eni) — Yeni) — f mni(y) dG(y)}, 

with mni(y) = mnily, B) and wW asin (5.6a.2). oO 


Now, define, for an 0<a<a, 
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M(P2, a) = {Q™EII"; Q" = Ht Fai, max; f |fni—f|" dG 0, r = 1, 2, 
l= 
max; ||Fni—Fl|_,— 0, f (2: |Axnill |Fai—FI]? dG < a’}. 


Lemma 5.6c.2. Assume that (1.1.1) holds with the actual d.f.’s of {eni, 
1<i<n} equalto {Fni,1<i<n} and suppose that we model the errors to 
be i.4.d F, F symmetric around zero. In addition, assume that (5.3.8), 
(5.5.1), (5.5.68) and the following hold. 

(17) G isa finite measure. 
Then, for every 0<a<w and sufficiently large n, 
M(P®, a) %(P2,ba,m), ba = (4pa)4/2a, a := G(R). 


Moreover, all assumptions of Lemma 5.6c.1 are satisfied. 


Proof. Fix an 0 <a<o. It suffices to show that 


(19) Bi f (ie? —£/?)? ¢ ba, 021, 
and 
(20) (a) maxif(fi-f)"dG<mm, 021, 


(b) maxi f (fi? -£7)? < m2, 21, 
imply all the conditions describing M,(P2, a). 
_ ese 2 2 
Claim: (19) implies f Bi |Axnill |[Fni—F|]° dG < a’, n> 1. 
By the C—S inequality, 
2 = 2 
(21) |Fni(x) —F(x)|°= | f (fi -f)| 
—o 
Ket /2 — gh/2\2 P*re1/2 , ct /22 
cf (mie ey’ fi + 27") 
—o —w 
C4 f (fA? — 21/7), 1<i<n, x eR. 
Hence, 


2 2 2 
Jf (5: ||Axnill |Fni-FI]° dG ¢ 3; || Axaill” - Sf (Pai-F)° dG 
¢ 4pa?-Di f (fa4? — £7), 
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which proves the Claim. 
The finiteness of G_ together with (21) and (20b) with m2— 0 
imply that max; ||Fai — Fl, — 0 in a routine fashion. The rest uses 


(5.5.66), (5.5.67) and details are straightforward. O 
Now let yy) = (-y) - Ay), yeR. Note that dy(—y) = ~4vy), dip 
=—2dy, dy= me and Coane? is Symmetric around 0, Toe 


= Var{y(e)|F}, r= fPdG, p=(9/0), 
Eni = €ni(Yni, 8) = Axni p(eni). 
Use the above facts to obtain 
23: f Eni(y, B) (phi(y))/*{(abi(y))”? — (w&(y))'?P ay 

= 23; Axi fpf’? (fai? -£/?) 

=3i Amni{ fp tai fp (fa? -£/7)} 

=—0 3; Axnif [[Fni—F] dy— fp (fai? - £17)? 
(22) =o YM Axi {2 [[Fni—F] faG — fp (fi? -£'/7)}. 


The last but one equality follows from integrating the first term by parts. 
Now consider the r.h.s. of (12). Note that because F and G are 
symmetric around 0, 


f bn {dG = f Yi Axni [Fni(y) — 1 + Fni(—y)] dvy) 
= f 3; Axni [Fui(y) — F(y) + Fni(-y) — F(-y)] 4X(y) 
(23) = 23; Axni [Fni— F] fdG. 
Recall that by definition T,(f, Py = B. Now take Ay, of (ii) of the H— 


differentiable requirement to be A ‘ra ‘ and conclude from (18), (22), 
(23), that 


| An{Tn(B, Q%) ws T,(B, PY) = 
— if Eni(y, A(phily))/{(ahs(y))/7— (whi(y))"7F ay | 
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¢ Ei Arn f fo (f4? —£/7)7|| < max; |]Axaill « [lpll_« bz = o(1), 


uniformly for {Q™} € %(P2,ba, mn). 


This proves that the requirement (ii) of the Definition 5.6c.1 is 
satisfied by the functional Ty, with the {&;} given as above. The fact 
that these {&n;} satisfy (i) and (iii) of the Definition 5.6c.1 follows from 
Ct (5.5.1), (17), (18) and the symmetry of F. This then verifies the 
H-differentiability of the above m.d. functional Tp. 


We shall now derive the asymptotic distribution of $ under any 
sequence {Q™} € ,(P2, a), under the conditions of Lemma 5.6c.2. For 


that reason consider Zn, of (16). Note that under Q2, (1/2)Z, is the sum 
of independent centered triangular random arrays and the boundedness of 


and (5.5.1), imply, via the L-F CLT, that C,2/? z+ — N(O, Ipxp), where 
Cy =4°E 232, = 3; AxnixniA oni, oni= Var{Xeni) |Fni}, 1 <i ¢ a. 
But the boundedness of ~ implies that max; los _ a” | +0, for every 
Qre M,(P2, a), where o” = Var {y(e1)|F}. Therefore o ‘gt a N(0, Ipxp). 
Consequently, from (15), 


lim lim, sup sup E{ %(|| An A -T, B, Q2))} | Qa} = E ¢(||ZI/). 
a P one x Exon)” (| An(6*—Tn(B, Q3))|] Qn} = E %(|1ZI) 


for every bounded nondecreasing function #, where Z is a N(0, Ipxp) r. v.. 
This and Lemma 5.6c.2 shows that the sequence of the m.d. 


estimators {f°} achieves the lower bound of (3) and hence is ].a.m. D 


Remark 5.6c.1. It is an interesting problem to see if one can remove 
the requirement of the finiteness of the integrating measure G in the above 


l.a.m. result. The l.a.m. property of {f} can be obtained in a similar fashion. 
For an alternative definition of lam. see Millar (1984) where, among other 


things, he proves the l.a.m. property, in his sense, of { p} for p = 1. 


A problem: To this date an appropriate extension of Beran (1978) to 
the model (1.1.1) does not seem to be available. Such an extension would 
provide asymptotically fully efficient estimators at every symmetric density 
with finite Fisher information and would also be l.a.m. o 


Note: The contents of this chapter are based on the works of Williamson (1979, 
1982), Koul (1979, 1980, 1984, 1985a,b), Koul and DeWet (1983), Basaw and Koul 
(1988) and Dhar (1991a, b). 00 


CHAPTER 6 
GOODNESS-OF-FIT TESTS FOR THE ERRORS 


6.1. INTRODUCTION 
Consider the model (1.1.1) and the goodness—of—fit hypothesis 
(1) Hy: Fni = Fo, Fo a known continuous d.f.. 


This is a classical problem yet not much is readily available in literature. 
Observe that even if Fo is known, having an unknown # in the model poses 
a problem in constructing tests of Hp that would be implementable, at least 
asymptotically. 


One test of Hy could be based on D, of (1.3.3). This test statistic is 
suggested by looking at the estimated residuals and mimicking the one 
sample location model technique. In general, its large sample distribution 
depends on the design matrix. In addition, it does not reduce to the Kiefer 
tees tests of goodness—of—fit in the k-—sample location problem when 
1.1.1) is reduced to this model. The test statistics that overcome these 
deficiencies are those that are based on the w.e.p.’s V_ of (1.1.2). For 
example, the two candidates that will be considered in this chapter are 


(2) De := supy | W%y, p) |; D; -= SUPy | W(y, All, 
where f is an estimator of f and, 
(3) Wy, t) = (XX) /*{V(y, t) -—X 1 Fo(y)}, yeR,teR?, 


| a (1, ceeey 1) txn- 


Other classes of tests are based on K 9(A,) and inf{K2(t), teR?}, where Ko 
is equals to the K, of (1.3.2) with W replaced by W°® in there. 


Section 6.2a discusses the asymptotic null distributions (a.n.d.’s) of the 
supremum distance test statistics for Hy) when f is estimated arbitrarily 
and asymptotically efficiently. Also discussed in this section are some 
asymptotically distribution free (a.d.f.) tests for Hp. Some comments about 
the asymptotic power of these tests appear at the end of this section. Section 


6.2b discusses a smooth bootstrap distribution of D3. 

Analogous results for tests of Hy) based on L»2-distances involving the 
ordinary and weighted empirical processes appear in Section 6.3. 

A closely related problem to Hy is that of testing the composite 
hypothesis 
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(4) Ay: Fni(-) = Fo(-/o), o >0, Fo a known df. 


Modifications of various tests of Hy and their asymptotic null 
distributions are discussed in Section 6.4. 

Another problem of interest is to test the composite hypothesis of 
symmetry of the errors: 


(5) Hs: Fni = F, 1<i¢<n,n>?1; F ad.f. symmetric around 0. 


This is a more general hypothesis than Hp. In some situations it may be of 
interest to test Hs; before testing, say, that the errors are normally 
distributed. Rejection of Hs would a priori exclude any possibility of 
normality of the errors. A test of Hs could be based on 


(6) Dis := supy | Wi(y, A)|, 

where 

(7) Wry, t):=n/ ?¥ [1(Yni <y + xnit) —I(-Yni < y —xait)] 
:= H,(y, t) -1 + H,(-y, t), yeR yeR, 


with H, asin (1.2.1). Other candidates are 


(8) Dos -= SUpy |W" (y, B)|, 

D3s := supy ||[W*(y, B)|| = supy[V* (y, A(X X)“V'(y, BY)”, 
where 
(9) W*:= AV’, V' := (Vi, ...., Vf), with 


n 
Vi(y, t) == Vi(y, t) —2 xnij + Vi(—y, t), 1¢ <p, yeR, teR?. 


Yet other tests can be obtained by considering various L2-norms involving 


Wi and W’. The asymptotic null distribution of all of these test statistics 
is given in Section 6.5. 


It will be observed that the tests based on the vectors W® and W' of 
w.e.p.’s will have asymptotic distributions similar to their counterparts in 
the k—sample location models. Consequently these tests can use, at least for 
the large samples, the null distribution tables that are available for such 
problems. For the sake of the completeness some of these table are 
reproduced in the following sections. 
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6.2. THE SUPREMUM DISTANCE TESTS 

6.2a. Asymptotic Null Distributions. 

To begin with, define, for 0<t< 1,8 € R?, 

(1)  W((t, s) := n’/"{H,(Fo4(t), s)—t}, W(t, 8) := W°(Fo/(t), s). 


Let 
(2) W(t) := W(t, 8), W(t) := W(t, A), O<t<1. 


Clearly, if Fo is continuous then the distribution of Dj, j= 1, 2, 3, is the 
same as that of ||Wil|_, sup{|W(t)|; 0 < t < 1}, sup{|[W(t)|]; 0 < t < 1}, 


respectively. Consequently, from Corollaries 2.3.3 and 2.3.5 one readil 
obtains the following Theorem 6.2a.1. Recall the conditions (Fol) and (NX 
from Corollary 2.3.1 and just after Corollary 2.3.2. 


Theorem 6.2a.1. Suppose that the model (1.1.1) and Hy hold. In 


addition, assume that X and Fo satisfy (NX) and (Fol), and that B 
satisfies 


(3) |A “(8 —p)| = Op(1). 

Then 

(4) sup | Wit, B) — {Wi(t, 6) + ao(t)-n'/? x,A-A (B—f)}| = 0f(1), 
(5) sup||W(t, A) — {W(t, 8) + ao(t)-A “(B— A)}I| = op(1), 

where qo := fo(F 9°) and the supremum is over 0<t <1. o 


Write W,(t), W(t) for W,(t, 8), W(t, 6), respectively. The following 
corollary gives the weak limits of W,; and W under Hp. 


Lemma 6.2a.2. Suppose that the model (1.1.1) and Ho hold. Then 
(7) W, > B, B a Brownian bridge in C0, 1]. 
In addition, if X satisfies (NX), then, 
(8) W > B’:= (By, ...., Bp) 


where By, ..... By are independent Brownian bridges in €(0, 1]. 
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Proof. The result (7) is well known or may be deduced from Corollary 
2.2a.2. The same corollary implies (8). To see this, rewrite 


(9) W(t) = AE xnifl(eni < Fo (t)) —t} = AX an(t), 
where a,(t) := (Qn{(t), ...., Qnn(t))’, with 
Omi(t) = {I(eni < Fo(t)) — t}, 1<i¢n, 0<t<¢1. 
Clearly, under Ho, 
(10) EW=0, Cov(W(s), W(t)) = (sAt —st)Ipxp, 0<s,t<1. 


Now apply Corollary 2.2a.2 p times, jth time to the w.e.p. with the weights 
and r.v.’s given as in (11) below, 1 < j< p, to conclude (8). 


(11) sy: ae d(j) =the jth column of XA, ther.v.’s Xni = ni, and F = Fo, 
€j<P, 


See (2.3.33) and (2.3.34) for ensuring the applicability of Corollary 2.2a.2 [ 
this case. 

Remark 6.2a.1. From (5) it follows that if # is chosen so that the 
finite dimensional asymptotic distributions of {W(t) + qo(t) A'(B - B); 


0<t<1} do not depend on the design matrix then the a.n.d.’s of D; , J = 2, 3, 
will also not depend on the design matrix. The classes of estimators that 
satisfy this requirment include M-, R- and m.d. estimators. Consequently, 


in these cases, the a.n.d.’s of Dj, j = 2, 3, are design free. 
On the other hand, from (4), the a.n.d. of D; depends on the design 


matrix through nl/ *z,A. Of course, if x, equals to zero, then this 
distribution is free from Fo and the design matrix. Oo 


Remark 6.2a.2. The effect of estimating the parameter B efficiently. 
To describe this, assume that 


(12) Fo has ana.c. density fy with a.e. derivative fo satisfying 
0 <Iy:= f (fo/f0)? dFo <a. 
Define 
(13) Sni -= — f o(€ni)/fo(eni); 1<1¢n; Sn = (Sni; veey Snn)’; 


and assume that the estimator B satisfies 


(14) A1(B—f) = Ih’ AX 8, + 0,(1). 
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Then, the approximating processes in (4) and (5), respectively, become 


(15) W(t) := Wi(t) + qo(t)-n'/? x,A-Ip' AX fn, 

W(t) := W(t) + qo(t)-Ip AX 8p, 0<¢ <1. 
Using the independence of the errors, one directly obtains 
(16) EW,(s) Wi(t) = {s(1-t) — nX_(X X) ‘Xp qo(s)qo(t)Io‘}, 

EW(s) W(t) = {s(1-+) —qo(s)qo(t)Io-} Ipxp, O04 8<t¢ 1. 


The calculations in (16) use the facts that Es, = 0, Ea@p(t)sn’ = qo(t)Inxn. 


From (16), Theorem 2.2a.1(i) applied to the quantities given in (11), 
and the uniform continuity of qo, which is implied by (12), it readily follows 
that W 3 Z:= (Zi, ..., Zp)’, where Zj,..., Zp are continuous independent 
Gaussian processes, each having the covariance function 


(17) o(s, t) := 8(1—t) — qo(s)qo(t)Io, O<s tel. 
Consequently, 
(18) D. 3 sup{|Z(t)]; 0<t<1}, D3 3 sup{]Z(t)|]; 0<t<1}. 


This shows that the a.n.d.’s of D;, j = 2, 3, are design free when an 
asymptotically efficient estimator of # is used in constructing the residuals 


while the same can not be said about Di. 
Moreover, recall, say from Durbin (1975), that when testing for Hp in 
the one sample location model, the Gaussian process Z, with the covariance 


function p appears as the limiting process for the analogue of D,. Note also 


that in this case, D; = D) = D3. However, it is the test based on D3 that 
provides the right extension of the one sample Kolmogorov goodness—of—fit 
test to the linear regression model (1.1.1) for testing Ho in the sense that it 
includes the k-sample goodness—of—fit Kolmogorov type test of Kiefer 
(1959). That is, if we specialize (1.1.1) to the k-sample location model, then 


D3 reduces to the Ty of Section 2 of Kiefer modulo the fact that we have 


to estimate 7. 

The distribution of sup{|Z,(t)|; 0<t<1} has been studied by Durbin 
(1976) when Fo equals N(0, 1) and some other distributions. Consequently, 
one can use these results together with the independence of Zi, ..., Zp to 


implement the tests based on D2, D3 in a routine fashion. a) 
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Remark 6.2a.3. Asymptotically distribution free (a.d.f.) tests. Here we 
shall construct estimators of # such that the above tests become a.d.f. for 
testing Ho. To that effect, write X, and A, for X and A to emphasize 
their dependence on n. Recall that n is the number of rows in X,y. Let m 
= my, bea sequence of positive integers, m, <n. Let X_ be myxp matrix 
obtained from some my rows Of Xp. A way to choose m, and these rows 
will be discussed later on. Relable the rows of X, so that its first m, rows 
are the rows of X, and let {en;*, 1<i<mn}, {Yni*; 1<i<mp} denote the 
corresponding errors and observations, respectively. Define 


(19) Snit c= — f o(eni*)/fo(eni*), 1<¢1< mp; Sy = (Sni*, 1<i<my)’, 
m -— Ty AnXn §n*, Ag = (XnXn) 2/?, 


Observe that under (12), 
(20) ET, = 0, AL eas ge OO 
Consider the assumption 
(21) My <1, My — o Such that 
(Kn Xn)'/?(Xa Xa) "(Xn Xn)? + psp. 
The assumptions (21) and (NX) together imply 
(22) may Xni’ AmAg Xni = 0(1). 
Consequently one obtains, with the aid of the Cramer—Wold LF—CLT, that 
(23) Tn — N(0, Io’ Ipxp). 


Now use {(xni, Ynit); 1<i< mag} to construct an estimator Bn of B 
such that 


(24) Ax (An — B) = Tn + 0)(1). 

Note that, by (21) and (23), An Aall = O(1) and, hence 
(25) An’ (Ba — B) = An AnTn + 0p(1). 
Therefore it follows that Ay satisfies (3). Define 


K*(t) := W(t) + An AnTn qo(t), O<t <1. 
From (5) and (25) it now readily follows that 
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(26) up, W(t, B) —K*(t)I] = op(2). 
We shall now show that 
(27) K* > B, with B as in (8). 


First, consider the covariance function of K*. By the independence of the 
errors and by (12) one obtains that 


E{I(eni < Foi(t)) —t}fo(enj*)/folenj*) =0, i#j, 1 $i <n, 1<j¢ my, 
Use this and direct calculations to obtain that 
(28) EK*(s)K*(t) = s(1-t)Ipxp 

Iq" qo(8)40(t)[21pxp-(XnXn) /7(XaXn) “(XnXn)”I, 
O<s<¢t<¢l. 

Thus (21) implies that 
(29) EK*(s)K*(t) — s(1-t)Ipxp, VO0<¢s<t<¢l. 
Because of (8) and the uniform continuity of qo, the relative compactness of 
the sequence {K*} is a priori established, thereby completing the proof of 
(27). Consequently, we obtain the following 


Corollary 6.2a.1. Under (1.1.1), Ho, (NX), (12), (21) and (24), 
: : P 
Dan —3 ,9Up, may, |Bi(t)I, Dan 7 GuR A» Bilt} 
where Din stand for the D; with B= Bu; = 2.3. Oo 


It thus follows, from the independence of the Brownian bridges {Bj, 1 
< j<p} and Theorem V.3.6.1 of Hajek and Sidak (1967), that the test that 


rejects Hy when Don > d_ is of the asymptotic size a, provided d is 
determined from the relation 


(30) 25 (—1)i*te-2i'd" = 1 — (1-a)'/?. 
J = 
Let 7? stand for the the limiting r.v. of Dgn. The distribution of Tp 


has been tabulated by Kiefer (1959) for 1 < p < 5. Delong (1983) has also 
computed these tables for 1 < p< 7. The following table is obtained from 
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Kiefer for 1< p< 5 and Delong for p = 6, 7, for the sake of completeness. 
The last place digit is rounded from their entries. 


-001 1.9495 2.1516 2.3030 2.4301 2.5422 2.6437 2.1373 
005 1.7308 1.9417 2.0977 2.2280 2.3424 2.445 2.940 
01 1.6276 1.8427 2.0009 2.1326 2.2480 2.3925 2.4925 
02 1.5174 1.7370 1.8974 2.0305 2.1470 2.252 2.390 
025 1.480 1.702 1.8625 1.9961 2.116 2.217 2.315 
05 1.3581 1.5838 1.7473 1.8823 2.0001 2.1053 2.2031 
.10 1.2239 1.4540 1.6196 1.7559 1.8746 1.981 2.0788 
15 1.1380 1.3703 1.5370 1.6740 1.7930 1.900 1.9977 
.20 1.0728 1.3061 1.4734 1.6107 1.730 1.8352 1.9349 


25 1.0192 1.2530 1.4205 1.5579 1.6773 1.785 1.8825 


Table 1: Values d such that P(Tp>d)xa@ for 1< p< 7. Obtained from 
Kiefer (1959) & Delong (personal communication). 


Note that for p =1, Dom and Dgm are the same tests and d of (30) is the 
same as the d of column 1 of Table 1 for various values of a. 
The entries in Table 1 can be used to get the asymptotic critical level 


of Dan for 1< p< 7. Thus for p=5, a= .05, the test that rejects Ho 


when Dan > 2.0001 is of the asymptotic size .05, no matter what Fo is 
within the class of d-f.’s satisfying (12). 


Next, to make D,-test a.df., let r= 1, be a sequence of positive 
integers, In <1, In —? ow. Let X; denote the rmmxp matrix obtain from 
some r, rows of X,. Relable the rows of Xy so that the first r, rows are 


in X, and let Y%, e{ denote the corresponding Y;’s and e;’s. Let 
A; = (xx)! 2 Assume that 
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(31) (i) fn*/2X, Ag] = O(1), and 
(ii) |nxp(X_Xz) /Xq — 2tpXn(XrXr) ’x-] = 0(1). 


Let f, bean estimator of # based on {(xni, Yni), 1<i< tp} such that 
(32) Ay'(Ar—-B) = Ty +0)(1), Ty = Ip 'AyX;8° 


where spi = —f(eni)/f(eni), 1<i<tp, and sp = (spi, 1<i< tn) . Define 


Ki(t) = W(t) + n!/7x, A, Tyqo(t), 0<t<1. 


Similar to (28), we obtain, for s < t, that 


* - =% / _ — —_ 
EK ,(s)K ,(t) = s(1 —t) —J 0 qo(s)qo(t){xn(XrXr) ‘nxn — 21px;y]} 
Argue as for Corollary 6.2a.1 to conclude 
Corollary 6.22.2. Under (1.1.1), Ho, (NX), (12), (31) and (32), 
(33) Dir — sup. , |B(t) I, 


where Dir 1s the D, with P= fr. Oo 
Remark 6.2a.4. Assumptions (21) and (31). To begin with note that if 


(34) limy n (XnXn) exists and is positive definite, 
then (21) is equivalent to 


1 


— 2. 


(35) nm) 


If, in addition to (34), one also assumes 


(36) limy Xn exists and is finite, 
then (31) is equivalent to 


1 


— 2. 


(37) Dy 


There are many designs that satisfy (34) and (36). These include the one 
way Classification, randomized block and the factorial designs, among others. 
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The choice of my, and frp rows is, of course, crucial, and obviously, 
depends on the design matrix. In the one way classification design with p 
treatments, nj observations from the jth treatment, it is recommended to 
choose the first myj = [nj/2] observations from the jth treatment, 1 < j< p, 
to estimate #. Here mp = my; + ... + Mpp = [n/2]. One chooses rpj = 
Mnj, 1<j< p, mm = Yj tnj = [n/2]. The choice of mp, and ry is made 
similarly in the randomized block design and other similar designs. If one 
had several replications of a design, where the design matrix satisfies (34) 
and (36), then one could use the first half of the replications to estimate B 
and all replications to carry out the test. 

Thus, in those cases where designs satisfy (34) and (36), the above 
construction of the a.d.f. tests is similar to the half sample technique in the 
one sample problem as found in Rao (1972) or Durbin (1976). 

Of course there are designs of interest where (34) and (36) do not hold. 


An example is p=1, Xnj =i. Here, X,X, = O(n°). If one decides to 
choose the first my(rn) xi’s, then (21) and (31) are equivalent to requiring 
(m,/n)° — 1/2 and (t,/n)? — 1/2. Thus, here Don or D3n would use 
79% of the observations to estimate £ while Dir would use 71%. On the 
other hand, if one decides to use the last m,(rn) x;’s, then Do, D3 will use 


the last 21% observations while D, will use the last 29% observations to 
estimate #. Of course all of these tests would be based on the entire sample. 
In general, to avoid the above kind of problem, one may wish to use, 
from the practical point of view, some other characteristics of the design 
matrix in deciding which my, rn rows to choose. One criterion to use ma 
be to choose those m,(rn) rows that will approximately maximize fied 
((tn/n)) subject to (21) ((31)). O 


Remark 6.2a.5. Construction of Bu and Br. If Fo isa d.f. for which 
the maximum likelihood estimator (m.l.e.) of # has a limiting distribution 
under (NX) and (12) then one should use this estimator based on fp (mp) 


observations {(x;, Y;)} for D, (D2 or D3). For example, if Fo is the 
N(0,1) d.f., then the obvious choice for fr and Bm are the least squares 
estimators: 

Be :=(XrXs) Xr Y% Bu := (XnXn) “XnYu* 


Of course there are many d.f.’s Fo that satisfy the above conditions, 
but for which the computation of m.l.e. is not easy. One way to proceed in 
such cases is to use one step linear approximation. To make this precise, let 


Ba be an estimator of # based on {(xni, Yni), 1<i< mp} such that 


(38) An (Bu — f) = Op(1). 
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Define 
(39) oly) := —foly)/fo(y), yeR; 
Bni = Vo(Yni—XniPm), 1<i¢< mn; Bm := (ni, 1¢i<mn)’; 
Bu := Bu + Ty AmnAn Xn8n; 
Vi, t)= AS Eni (Yni< y + Xnit), y ER, te RP. 
Then 


, * is 
Ag XaSn = f Poly) Vi (dy, Bn). 
From this and (2.3.37), applied to {(xni, Yni), 1¢i< mp}, one readily 
obtains 
Corollary 6.2a.3. Assume that (1.1.1) and Hy hold. In addition, 
assume that Fo is strictly increasing, satisfies (12) and is such that % is a 
finite linear combination of nondecreasing bounded functions, X and {Bu} 
satisfy (NX) and (38). Then {Bu} of (39) satisfies (24) for any sequence 
Mn QO, as ln—o. 
Proof. Clearly, 
Ax (Bs 7 B) aad Aa’ (Bn — B) + ig: AyXnSn. 
But, integration by parts and (2.3.37) yield 
/ * a * 
AnXn{Sn = Sn} = f voly){Val(dy, B) — Vun(dy, B)} 
* = 5 
=—f {Valy, B) — Valy, A} dyo(y) 
-j = 
=—-An (Ba—B) f foly)ddo(y) + op(1) 
= An (Ba — A)lo + 09(1). o 
The above result is useful, e.g., when Fo is logistic, Cauchy or double 
exponential. In the first case m.l.e. is not easy to compute but Fo has finite 
second moment. So take f, to be the l.s.e. and then use (39) to obtain the 


final estimator to be used for testing. In the case of Cauchy, Ba may be 
chosen to be an R-estimator. 


Clearly, there is an analogue of the above corollary involving {f,} 
that would satisfy (31). o 
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6.2b. Bootstrap Distributions 


In this subsection we shall obtain a weak convergence result about a 
bootstrapped w.e.p.’s and then apply this to yield bootstrap distributions of 
some of the above tests. 

Let (1.1.1) with enj =e; and Ay hold. Let Eo and Po denote the 
expectation and probability, respectively, under these assumptions. In 
addition, throughout this section we shall assume that (Fo1), (Fo2) and (NX) 
hold. 


Recall the definition of W, W from (6.2a.1), (6.2a.2). Let ff be an 
M-estimators of §# corresponding to a bounded nondecreasing right 
continuous score function # such that 


(1) fv dF) = 0, f fo dy > 0. 
Upon specializing (4.2a.8) to the current setup one readily obtains 
(2) A*(B—) = -63; Axni ei) + 09(1), (Po). 


where « := 1/ffp dy. 
Let the approximating process obtained from (6.2a.5) and (2) be 


denoted by W, i-e., 


(3) W(t) := Dy Axni{I(e; < Fo'(t)) - t -  qo(t) Ye}, «OS. 
Define 
(4) o? := Eoy(e:), 
go(t) := Eo{I(ex< Fo (t)) - t} Her) 
= f(x < Fo'(t)) Wx) dFo(x), 0<¢<1, 


and, for 0<t<u< tl, 

(5) po(t, u) := t(1-u) - & [do(t)go(u) + go(t)ao(u)] + Kqo(t )qo(u)o?. 
Note that 

(6) Cy(t, u) = Eo{W(t)W(u) } = polt, uIpxp, OC t< Ud. 


Let Go := (Go1, ---» Gop)’ be a p-vector of independent Gaussian processes 
each having the covariance function po. Thus, EQp(t)Go(u)’ = Co(t, u). 


Since po is continuous, Ge{C[0, 1]}. Moreover, from Corollary 2.2a.1 
applied p time, jth time to the entities Xnj = ei, Fni = Fo and dni = (i,j)*® 
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entry of AX, 1<j<p, 1<i<n, and from the uniform continuity of qo it readily 
follows that 


(7) W > Go in [{D[0, 1]}?, 2’). 
Now, let fy, be a density estimator based on {@ni :=Yni - XniB 1<i<n} 
a * a 
and F, be the corresponding d.f.. Let {en3; 1<i<n} represent iid. Fy 
I.v.’S, 1.€., fens; 1<i<n} is a random sample from the population F,. 


Because F, is continuous, the resampling procedures based on it are usually 
called smooth bootstrap procedures. Let 


(8) Yui = Xnip+ eni, 1<i¢n. 
Define the bootstrap estimator B to be a solution séeR? of the equation 

* , mC x 
(9) Bj Axni {W(Yni - xni8) - Eny(en)} = 0. 


where E, is the expectation under F,. Let P, denote the the bootstrap 
probability under Fy. Finally, define 


(10) S*(t, u) = 3, AxniT(eni< Fal(t)+xpiAu),  0<t<1,ueER?, 
and the vector of bootstrap w.e.p.’s 

(11) W*(t) = D5 Axni {I(Yni-xni8 < Fat) -t}, 9 0< <1. 
We also need 

(12) W*(t) := Dy AxnifI(eni < Fa4(t)) - t}, 0<t<1. 


Our goal is to show that W* converges weakly to Go in [{D[0, 1]}?, 2], a.s.. 
Here a.s. refers to almost all error sequences {e;;i > 1}. Wein fact have the 
following 

Theorem 6.2b.1. In addition to (1.1.1), Ho, (Fo1), (Fo2), (NX) and (1), 
assume that y 1s a bounded nondecreasing right continuous score function 
and that the following hold. 
(13) For almost all error sequences {e;; i>1}, fn(x)>0 for almost all x€R, n>1. 
(14) lin - full, = 0, as, (Po). 
Then, V 0< B<oa, 


(15) sup ||S*(t, u) — S*(t, 0) — ufn(Fa°(t))|| = op(1), (Pa), a.s., 


6.2b THE SUPREMUM DISTANCE TESTS 189 
Bootstrap Distributions 


where the supremum is over 0 < t < 1, |lul| < B. 
Moreover, for almost all error sequences {e;; i > 1}, 


(16) A'(6 - B) = ~in Ui Axni{Weni) - EnYens} + 0p(1), (Pn), 
and 
(17) W* 3 Go in [{D[0, 1]}”, 2], 


where Ky :=1 if fy, dy. 
Proof. Fix an error sequences {e;;i > 1} for which 
(14*) f,(x) > 0, for almost all xeR, and ||f, - foll, — 0. 


The following arguments are carried out conditional on this sequence. 
Observe that S*(t, u) is a p-vector of w.e.p.’s Sa(t, u) of (2.3.1) 
whose jth component has various underlying entities as follows: 


x ne 
(18) Xni=eni, Fni= Fn, Cni = AXni, dni = @ 5) Xni, 1<ig¢n 


where, as usual, a,;) = jt® column of A, 1<j<p. 

Thus, (15) follows from p applications of Theorem 2.3.1, jth time 
applied to the above entities, provided we ensure the validity of the 
assumptions of that theorem. But, fy uniformly continuous and (14) readily 


imply that {fp, n > 1} satisfies (2.3.3a,b). In view of (2.3.33), (2.3.34) and 
(NX), it follows that all other assumptions of Theorem 2.3.1 are satisfied. 
Hence, (15) follows from (2.3.6). In view of (13) we also obtain, from (2.3.7), 


(19) sup ||S°*(x, u) — 5°%(x, 0) —ufn(x)|] = op(1), (Px), 


where S*(x, u) = S*(F(x), u) and where the supremum is over xeR, ||ul|<B. 
Now, (16) follows from (19) in precisely the same fashion as does (4.2a.8) 
from (2.3.7). 


From (11), (15), (16) and (31) below, we readily obtain that, under Py, 


(20) We(t) = 3; Axni{I(eni < Pa'(t)) - t- Ra Gn(t)[Weni) - EnYens)]} 
+ Op(1), 
where Qn := f,(Fn2). 
In analogy to (4) and (5), let gn, fn stand for go, po after Fo is 
replaced by F, in these entities. Thus 


(21) 8n(t) = En{l(en1 ¢ Fa'(t)) - t} Went) 
= fix ¢ Fu(t)) Hx) dPa(x), O<t<, 
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and, for 0<t<u<1, 


(22) o(t, u) == t(1-u) - Raldin(t)8n(u)+Bn(t)in(u)] + Rndn(t)Gn(u) os. 
where 62 := En[¥Wen1) - Endlens)’. 
Let W*(t) denote the leading r.v. in the r.h.s. of (20). Observe that, 


(23) Ca(t, u) = En{W*(t)W*(u) } = fn(t,u)Ipxp, OS BS USL 


(24) Claim: pri(t, u) 0 pot, u); V O<tc<uc¢l. 


To prove (24), note that (14*) and Scheffé’s Theorem (Lehmann, 1986, 
p573) imply that for the given error sequence {e;; i > 1}, 


(25) fn := |[Fn—Foll, — 0, 
which, together with the continuity of Ee: yields 
“1 
(26) SUP cy [Fo(Fn (t)) -t| — 0. 
Also, observe that 
a a -{ a -| a 
sexg, | in(Fa'(t)) ~ fo(Fa'(t))| < [l¥n ~ fll, — 0, 
by (14*), and that, 
|fo(Fn'(t)) - fo(Fo (t))1 = |ao(Fo(Fn (t))) - aolt)], V OS t <1. 
Hence, by (26) and the uniform continuity of qo, which is implied by (Fo1), 


|Gn(t) - qo(t)| — 0. 


sup 


(27) SUP cy 


rewriting g,(t) = f I(Fa(x) < t)Y(x)fn(x) dx, from (14*), Scheffé’s Theorem 
(Lehmann: 1986, p 573) and the boundedness of y, we readily obtain that 


SUP, lén(t) ~ en(t)| < ff | #a(x) ~ folx)| dx — 0. 
But, the inequality Fo(x) - dy < F(x) < Fo(x) + by for all x, implies that 


Jgn(t) - go(t)| < lI¥ll, fUFo(x) - én < t < Fo(x) + én) dF (x), 
< [lvl 26n, V OCECL. 


Next, let gn(t) = fI(Fn(x) < t)Wx)fo(x) dx, 0 < t <1. Upon 


Hence, by (25), 
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(28) SUP ¢,¢,/Bnlt) ~ Bolt)| — 0. 


0<t<1 
Again by the boundedness of 7, (14*) and (25), one readily concludes that 
(29) in 3 K, G2 — O%. 
Claim (24) now readily follows from (27) — (29). 

Now recall (12) and rewrite W* as 


(30) W*(t) = We(t) - &n Gn(t) Bi Axni [Yeni) - Env(eni)]. 
Observe that because 
BnllZ; Axni [YW(eni) - EnWens)]||? = p 32, 
by (29) and the Markov inequality it follow that 
(31) 3; Axai [Wens) - EnYendlll = Op(1), (Pa). 


Apply Corollary 2.2a.1 p times, jth time to the entities given at (18), to 
conclude that 


lim lim supy P,( sup | W*(t) - W*(s)| > 7) = 0. 
m0 |t-s|<n 

This together with (31), (30), (27) and the uniform continuity of Fo implies 

that the sequence of processes {W*} is tight in the uniform metric 2 and 

all its subsequential limits must be in {€[0, 1]}¥. Now, (17) follows from 

this, Claim (24), (20), (13), (14) and (6). o 

Remark 6.2b.1. One of the main consequences of (17) is that one can 


use the bootstrap analogue of Ds, v.i.z., D3 := sup{||W*(t)]|], 0<t<1} to 
carry out the test Hp. Thus an approximation to the the null distribution of 


D3 is obtained by the distribution of D3 under P,. In practice it means to 
obtain repeated random samples of size n from F,, compute the frequency 
distribution of D; from these samples and use that to approximate the null 
distribution of D3. At least asymptotically this converges to the right 


distribution. Obviously the smooth bootstrap distributions for D,, D2 can be 
obtained similarly. 
Reader might have realized that the conclusion (17) is true for any 


sequence of estimators {6}, {p} satisfying (2) and (16). o 
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6.3. L2-DISTANCE TESTS 


Let K{ and Ko, respectively, stand for the K, and K, of (5.2.5) and 
(5.2.7) after the d.f.’s {Hni} there are replaced by Fo. Thus, for GeDZ(R), 


(1) Ki(t) = f {Wi(y, t)PaG(y), 

K2X(t) = f | W°(y, t)ll"dG(y), te R?, 
where W is as in (6.1.3) and 
(2) Wily, t) = n/"[Haly, t) — Foly)], yeR, ter? 
Let f be an estimator of f and define the four test statistics 
(3) Kj := inf {K,(t);t¢€R’}, Kj:=K,(f), j=1,2. 


The large values of these statistics are significant for testing Hp. 
We shall first discuss the a.n.d.’s of K;, j=1,2. Let Wi(-), W'(-) 
stand for W4(-, 8) and W°(-, A). 


Theorem 6.3.1. Assume that (1.1.1), Ho, (NX), (5.5.68) — (5.5.70) 
with F = Fo hold. 


(a) If, in addition, (5.6a.10) and (5.6a.11) hold, then 


(4) Ki = f{Wily) - fly) eo Pas + 0)(1). 


(b) Under no additional assumptions, 


i f 2 
(5) Kr = flw°(y) - foly) Pe F09G ag + 0,(1), 
f fodG 
Proof. Apply Theorems 5.5.1 and 5.5.3 twice, once with D = ie 24, 
0, ..., 0] and once with D = XA, and the rest of the entities as follows: 
(6) Yni = €ni, Hyi = Fo = Fni, Gn zG. 


The theorem then follows from (5.5.28), (5.6a.5), (5.6a.12) and some algebra. 
see also Claim 5.5.2. oO 


Remark 6.3.1. Perhaps it is worthwhile repeating that (5) holds 
without any extra conditions on the design matrix X. Thus, at least in this 
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sense, K> is a more natural statistic to use than Ki for testing Ho. 
A consequence of (4) is that even if A, of (5.2.4) is asymptotically 
non—unique, K; asymptotically behaves like a unique sequence of r.v.’s. 


Moreover, unlike the Dy,-statistic, the asymptotic null distribution of Ki 
does not depend on the design matrix among all those designs that satisfy the 
given conditions. 

The assumptions (5.6a.10) and (5.6a.11) are restrictive. For example, 
in the case p = 1, (5.6a.10) translates to requiring that either xj; >0 for all 


i or x3,;<¢0 forall i. The assumption (5.6a.11) says that x #0 or can not 
converge to 0. Compare this with the fact that if xx 0 then the asymptotic 
distribution of D, does not depend on the preliminary estimator B. o 


Next, we need a result that will be useful in deriving the limiting 
distributions of certain quadratic forms involving w.e.p.’s. To that effect, let 


LA(R, G) be the equivalence classes of measurable functions h: R to R? such 
that in| 4 = I[h||7dG <q. The equivalence classes are defined in terms of 
the norm |-| Es In the following lemma, {a;; i> 1} is a fixed orthonormal 
basis in L2(R, G). 

Lemma 6.3.1. Let {Zp, n > 1} be a sequence of p-vector stochastic 
processes with EZn = 0, Cov(Zn(x), Zn(y)) := Kn(x, y) = ((Knij(x, y))), 
1<i,j<p, x,y ER. In addition, assume the following: 

There 1s a covariance matriz function K(x, y) = ((Kij(x, y))), and a 
p-vector mean zero covariance-—K Gaussian process Z such that 


(i) (a) by Jf Knji(x, )4G(x) <@, 221. (b) by J Kiilx, dG(x) <o. 


- p p 
(ii) 2, J Knjilx, x)dG(x) + 3 fKii(x, x)dG(a), 
(iii) For every m > 1, 
(f ZnaidG, ..., { ZnaadG) = (fZ adG, ..., fZ andG); 
(iv) For each i > 1, 
B( f ZnaidG)” — EB( f Z aidG)’. 
Then, Zn, Z belong to LAR, G), and 


(7) Zn 3 Z in L(R, G). 


194 GOODNESS—OF—FIT TESTS 6.3 


Proof: In view of Theorem VI.2.2 of Parthasarthy (1967) and in view 
of (iii), it suffices to show that for any «> 0, there is an N (= Ne) such 
that 


2 
(8) supn E Rr ( f Z,a;dG ) < €. 
Because of the properties of {a;}, Fubini and (i), 
P 2 d 2 
(9) 2 f Knii(x, x)dG(x) = E| Zn] 4 = Py B(_f ZnaidG )’, 
e _ 2 tata N2 
(10) Py f Kii(x, x)dG(x) = E]2| 6 =% B( fZ adc)’. 


Thus, to prove (8), it suffices to exhibit an N such that 
‘ 2 
(11) supn Re E( f Z,a;0G ) ¢ €. 
By (ii), (9) and (10), there exists Nye such that 
‘ 2 Z 2 
(12) X B(fZnaidG)’< ¥ E(fZadG)’ + €/3, m2 Nic. 
By (i)(b) and (10), there exists N(= Ne) such that 
(13) %, E(fZ aidG)? < ¢/3. 
1Z 
By (iv), there exists Noe such that 
: 2 ; 2 
(14) 3, E(f% adG)’< 3, B( fZnaidG)’ + ¢/3, n> Noe. 
Therefore, from (12) — (14), with N = Ne := NyeVNoe , 
. 2 
SUD, y on E( f ZnaidG ) 
: 2 ; 2 
< sup xl, E( fz aidG ) Jy (_f ZnaidG) ] + €/3 < «. 


Use (i)(a) to take care of the case n < Ne. This proves the result. D 


Remark 6.3.2. Millar (1981) contains a special case of the above lemma 
where p = 1, Z, is the standardized ordinary e.p. and Z is the Brownian 
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bridge. The above lemma is an extension of Millar’s result to cover more 
general processes like the w.e.p.’s under general independent setting. In 
applications of the above lemma, one may choose {a;} to be such that the 
support S; of a; has G(S;) <,i>1 andsuch that {a;} are bounded. o 


Corollary 6.3.1. (a) Under the conditions of Theorem 6.3.1(a), 
(15) Ki = f{ar)-f- 4 roe dG =: Gi, (say). 
0 


(b) Under the conditions of Theorem 6.3.1(b), 


(16) = Kz — f || B(Fo) —fo- a a =: Gs, (say). 
0 


Here B, B are is as in (6.2a.7), (6.2a.8). 
Proof: (b) Apply Lemma 6.3.1, with a; as in the Remark 6.3.2 above, 
Zz. = W — [W'fo dG.t, Z= BF) - [B(Fo) fodG 5 

Direct calculations show that EZ, = 0 = EZ, and V x,yeR, 


to 


Ky(x, y) := EZn(x)Zn(y) =Ipmp 4x, y) = K(x, y) =: EZ(x)Z(y), 


where, for x,y ER, 
Kx, y) == k(x, y)—a “fo(y) f k(x, 8) dys) —a “foly) f k(y, 8) dx(s) + 
+a f {k(s, t) dY(s)dy(t), 


k(x, y) := Fo(xhy) —Fo(x)Fo(y), W(x) =f fodG, a= Ha). 


Therefore, (5.5.68), (5.5.69) imply (i), (ii) and (iv). To prove (iii), let 
Ai, «> Am be real numbers. Then, 


PR f %najdG = f W° bdG -1W ed - fodp=:h(W), (say), 


where b:= 3 Ajai. Because ~ and bdG are finite measures, h(W’) is a 
j= 


uniformly continuous function of W°. Thus by Lemma 6.2a.2 and Theorem 
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5.1 of Billingsley (1968), h(W°) a h(B(Fo)), under Hp and (NX). This 


then verifies all conditions of Lemma 6.3.1. Hence Z, 3 Z in L3(R, G). In 
particular f|Zall’aG ae f \|Z||dG. This and (5) proves (16). The proof of 
(15) is similar. O 


Remark 6.3.3. Ther.v. G; can be rewritten as 
2 
{fodG 


Recall that G, is the same as the limiting r.v. obtained in the one sample 
location model. Its distribution for various G and Fp) _ has been 
theoretically studied by Martynov (1975). Boos (1981) has tabulated some 


critical values of G; when dG = {F,(1 — Fo)} (dF and Fo = Logistic. 
From Anderson—Darling or Boos one obtains that in this case 


Gi= f BL ayy tat —( f Ble)at)? = B NZ/AAi+ 1 


where {N;} arei.id. N(0,1) r.v.’s. From Boos (Table 3), one obtains the 
following 


Table ll 


a 005 O1 025 05 


ta 1.710 1.505 1.240 1.046 


In Table II, ta is such that P(G; > ta) = a. For some other tables see 
Stephens (1979). 


The r.v. G» can be rewritten as 


B(F) f odG||? 
G,:= {|| BF)|Pac — {BUF o) f dG" 
J {f3dG 
P BAF,)f£ ,dGy 
=3 | f Bi F,)aG — UBilFo)f dG)” 
jst Se (f2aG 


which is a sum of p independent r.v.’s identically distributed as Gy. The 
distribution of such r.v.’s does not seem to have been studied yet. Until the 


distribution of G2 is tabulated one could use the independence of the 
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summands in G» and the bounds between the sum and the maximum to 
obtain a crude approximation to the significance level. 


For p = 1, the a.n.d. of Ki and K3 is the same but the conditions under 
which the results for Ki hold are stronger than those for K>. Oo 

The next result gives an approximation for K;, j= 1, 2. It also follows 
from Theorem 5.5.1 in a fashion similar to the previous theorem, and hence 
no details are given. 


Theorem 6.3.2. Assume that (1.1.1), Hp, (NX), (5.5.68) — (5.5.70) 
with F=Fo and (6.2a.3) hold. Then, 


(17) Ki = f[Wwi(y) + 2°/? A.A "(B— p)fo(y)’dG(y) + op(1). 
Ka = f\|W°(y) + A “(B— Apfo(y)I|"dG(y) + 09(1). : 


From this we can obtain the asymptotic null distribution of these 


statistics when f is estimated efficiently for the large samples as follows. 
Recall the definition of {s;} from (6.2a.13) and let 


vi(y) = Tei < y) — Fo(y) + nx’(X X) ‘x: silo fo(y), 
ai(y) :=I(ei< y)—Fo(y) + silo foly), 1<i<n,  yeR, 
a=(a4,...,0), V= (MN > I) - 
Also, define 


(19) Zns(y) = Wy) + 0'/?x AAX’ shofoly) = 0”? ¥ (y) 


Zna(y) = W°(y) + AX sly fo(y) = AX a(y), yeR 
From Theorem 6.3.2 we readily obtain the 


Corollary 6.3.2. Assume that (1.1.1), Ho, (NX), (5.5.68) — (5.5.70) 
with F = Fo, (6.2a.12) and (6.2a.14) hold. Then, 


(20) K; = f ZndG + Op(1). 


(21) K2= f ||Znzl|"dG + 091). Oo 


Next, observe that for y < z, 
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Kni(y, 2) := Cov(Zni(y), Zni(z)) 

= Fo(y)(1-Fo(z)) — nx’ (X'x) tx Slvyfol2) _: gy, 2), 
Knaly, 2) = EZna(y)Zn2(2) 

= {Foly)(1 — Fo(z)) — olay, =: oly, 2), say. 


Now apply Lemma 6.3.1 and argue just as in the proof of Corollary 6.3.1 to 
conclude 


Corollary 6.3.3. (a). In addition to the conditions of Corollary 6.3.2, 
assume that 


(22) nx’(X X) ‘x — ¢, [c| <a. 
Then, 
(23) Ki = fZi(y)dGQy) 
where Z, 1s a Gaussian process in Lo 2, G) with the covariance function 
(24) K(x, y) := Fo(x)(1 — Fo(y)) — cfo(x)fo(y) Lo" x <y. 
(b) Under the conditions of Corollary 6.3.2, 
; 2 
(25) Ky — fl Yoll“aG 


where Yo 1s a vector of p independent Gaussian processes in L2(R, G) with 
the covariance matriz 1o-Ipy. O 


Remark 6.3.4. Again, observe that the test statistic K; based on the 
ordinary empirical of the residuals has an a.n.d. which is design dependent 


whereas the a.n.d. of the test based on the weighted empiricals Ky» is design 
free. In fact, for p = 1, the limiting r.v. in (25) is the same as the one that 
appears in the one sample location model. For G = Fy = N(0, 1) d4,, 
Martynov (1976) has tabulated the distribution of this r.v.. Stephens (1976) 
has also tabulated the distribution of this r.v. for G = Fo, d@ = dGp = 


{Fo(1 = F)} 1aFo, and for Fo = N(0, 1). For G= Fo, F = N(0, 1) d.f., 
Stephens and Martynov’s tables generally agree up to the two decimal places, 
though occasionally there is an agreement up to three decimal places. In any 
case, for p = 1, one could use these tables to implement the test based on 


Ko, at least asymptotically, whereas the test based on Ke being design 
dependent, can not be readily implemented. For the sake of convenience we 
reproduce some of the Stephens (1976, 1979) tables below. 
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Table II 
Fo = N(0, 1) 


.Q25 


K2(Fo) 237 .196 165 135 


K2(Go) 1.541 1.281 1.088 897 


In Table III, K.(G) stands for the K2 with G_ being the integrating 


measure. K2(Go is the K> with the Anderson—Darling weights. Table III 
is, of course, useful only when p = 1. o 


As far as the asymptotic power of the above L»-tests is concerned, it is 
apparent that Theorems 5.5.1, 5.5.3 and Lemma 6.3.1 can be used to deduce 
the asymptotic power of these tests against fairly general alternatives. Here 


we shall discuss the asymptotic behavior of only Ki, j= 1,2 under the 
heteroscedastic gross errors alternatives. More precisely, suppose that 


(26) Fai = (1— bni)Fo + OniF 1, 0< bri ¢ 1, max; dni — 0, 
F 1a fixed d.f. Let 
mM, = n 2/2 i bni(F 1 = Fo), M9) = di Axnidni(F1 = F). 
Lemma 6.3.2. Let (1.1.1) hold with eny having the df. Fni given by 
26),1<i<n. Suppose that X satisfies (NX); (Fo, G) and (Fi, G) satisfy 
5.5.68) — (5.5.70) and that 


(27) f |Fi-FoldG <a, 
(a) If, in addition, (5.6a.10) and (5.6a.11) hold, then 
0 
(8) Ki= f{Wi + my— fo HWE tl Pag + o((1) 


stzaG 
provided 


(29) n 2/2 Yi Oni = O(1). 
(b) Without any additional conditions, 
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(30) K>= fw? + m,—f, 0 cis + op(1), 
0 


provided 
(31) Y; Axnibni = O(1). 


Proof. Apply Theorem 5.5.1 and (5.5.49) to D = n ry 0, ..., 0}, 
Yni = €ni, Hni = Fo, {Fni} given by (26) to conclude (a). Apply the same 
results to D = AX and the rest of the entities as in the proof of (a) to 
conclude (b). o 


Now apply Lemma 6.3.1 to 


0 
(30) Zn := W? + my — fy LW + myfodG 
(f24G 


Ze= B(F 0) + ai(Fy— Fo) — fy B Fy + =! Bi a Fo fydG 
(f2aG 


where a, := lim supp nt/ : Yi bni, to obtain 
Corollary 6.3.4. Under the conditions of Lemma 6.3.2(a), 
x 2 Say cetd 
K, = fz dG, where Z is as in (30). D 
Similarly, apply Lemma 6.3.1 to 


oO 
(31) A= Ww + mo — fo Ww " = fodG 
(fac 


Z:= BFo) + a(Fy—Fo) — fy) 4A Po) + aati — Fo) }fodG 


sizaG 


? 


where a2 = lim supp Y; Axniébni, to obtain 


Corollary 6.3.5. Under the conditions of Lemma 6.3.2(b), 


Kz — f ||Z\|"aG, where Z is as in (31). D 


An interesting choice of 6p; = p t/ 2 Axaill. Another choice is 
fajzn/?. Botha priori satisfy (26), (29) and (31). O 
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6.4. TESTING WITH UNKNOWN SCALE 


Now consider (1.1.1) and the problem of testing Hy, of (6.1.4). Here 
we shall discuss the modifications of D,, K;, j= 1, 2, of Sections 6.2, 6.3 
that will be suitable for H; With W{, W° as before, define 


(1) D,(a, u) := supy|Wi(ay, u)|, 
D2(a, u) := supy| W*(ay, u)|, 
K,(a, u) = f(Wi(ay, u)}’dG(y), 
K,(a, u) := f ||W°(ay, u)||"dG(y), a>0, weR?. 


Let (@, A) be estimators of (0, f), Dj; and Kj stand for Dj(z, B) and 


K;(G, B), respectively, j= 1,2. The following two theorems give the a.n.d.’s 
of these statistics. Theorem 6.4.1 follows from Corollary 2.3.4 in a similar 
fashion as does Theorem 6.2.1 from Corollaries 2.3.3 and 2.3.5. Theorem 
6.4.2 follows from Theorems 5.5.8 in a similar fashion as does Theorem 6.3.2 
from Theorem 5.5.1. Recall the conditions (Fo1) and (Fo3) from Section 2.3. 


Theorem 6.4.1. In addition to (1.1.1) and Hj, assume that (NX), (Fol), 
(Fo3) and the following hold. 


(2) (a) |n'/?(a—a)o*| = 0,(1). (db) JA *(B— All = 0p(1). 
Then, 
Dy = sup |Wi(t) + go(t){n'/?x,(B-A) + n'/?(S—0) Fo (t)}o "| + op(1), 
and 
Dz = sup |W(t) + qo(t){A (B-6) +n? Ax, -n/?(-o)Fo'(t)}o" |] 
+ Op(1), 
where now W,(-):= WY(oFo-(-), 8) and W(-):= W(oFo(-), A). 


Theorem 6.4.2. In addition to (1.1.1) and HH, assume that (NX), (2), 
(5.5.69) with F = Fo, and the following hold. 


(3) Fo has a continuous density fy such that 


(a) 0< flyli fi(y)dG(y) <0, j=0,k =1,2; j= 2,k=2. 
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(b) lim lim sup, f fi(y+m /?+s)dG(y) = ft dG(y), k = 1, 2, TER 
(c) lim f ly| fo(y(1ts))dG(y) =f ly! fo(y)dG(y). 

Then, 
K, 


fIW(oy, B) + foly){n/xn(B — p)en/?(a — o)y}o"PaG(y) 
7 Op(1), 
K, 


f I W°(oy, A) +fo(y){A “(B- B) 
+ nV? ax, -n/2(5 — a)y} ao *\|?dG(y) + Op(1). 


Clearly, from these theorems one can obtain an analogue of Corollary 


6.3.2 when (G, #) are chosen to be asymptotically efficient estimators. 

As is the case in the classical least square theory or in the 
M-estimation methodology, neither of the two dispersions Kj,(a, u) and 
K.(a, u) can be used to satisfactorily estimate (oc, 8) by the simultaneous 
minimization process. The analogues of the m.d. goodness-of-fit tests that 


should be used are inf{K;(@, u); ucR’}, j = 1, 2. The methodology of 
Section 5 may be used to obtain the asymptotic distributions of these 
statistics in a fashion similar to the above. Oo 


6.5. TESTING FOR SYMMETRY OF THE ERRORS 


Consider the model (1.1.1) and the hypothesis H; of symmetry of the errors 
specified at (6.1.5). The proposed tests are to be based on Dis, j= 1, 2, 3, of 
(6.1.6), (6.1.7), Ki(f), and inf{K7(t); teR?}, j = 1, 2, where 


(1) Kit) = f{Wily, t)}dG(y), Ki(t) = f||W'(y, t)|"dG(y), tek, 


with Wi and W’ asin rel and (6.1.9). Large values of these statistics 
are considered to be significant for Hs. 

Although the results of Chapters 2 and 5 can be used to obtain their 
asymptotic behavior under fairly general alternatives, here we shall focus 
only on the a.n.d.’s of these tests. To state these, we need some more 
notation. Forad.f. F, define 


(2) Fly) :=F(y)-F(-y), yy 20. 


Then, with F? denoting the usual inverse of a d.f. F, we have 
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(3) Fy(t) =F “((1+t)/2), -FV'(t) = F*((1-t)/2), 0 < <1, 
for all F that are continuous and symmetric around 0. Finally, let 
(4) Wi(t) := Wi(Fs(t), , W(t) := W(Fs(t), 8), 

q(t) = {(F+'(t)), 0<t<1, 


We are now ready to state and prove 


Theorem 6.5.1. In addition to (1.1.1), Hs and (NX), assume that F in 
H, and the estimator B satisfy (F1) and 


(5) A *(B— B)I| = Op(1), under Hs. 

Then, 

(6) Dis = sup, |Wi(t) + 2q°(t) n/*x,A A “(B—A)| + op(1), 
(7) Das = gup, |W (t) + 2q%(t) A “(B—A)| + 09(1). 

and 

(8) Das = ,gup, [IW (t) + 2a*(t) A“(B— )]| + 0p(1). 


Proof. The proof follows from Theorem 2.3.1 in the following fashion. 
The details will be given only for (8), as they are the same for (7) and quite 
similar for (6). Because F is continuous and symmetric around 0 and 


because W'(-, +) = W(--, +), Das = sup W (Fs (t), p). But, from the 


definition (6.1.8) and (3), it follows that fora ve R?, 


W(F, (t), ¥) 
_y. op lity. .’. ep liylty..’. 
—_ i Axni{I(€ni < F (=-)+enit) + I(eni < F ( ) )+Cy iu) = 1} 
(9) = $4, w) + S(54, 0) — 3; Amn, 0<¢<1, 
where 
S(t, u) := Yi Axni I(eni < F(t) + enin), 0<t<, 


is a p—vector of Sg—processes of (2.3.1) with Xpi = ni, Fni = F = H, ni = 
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Axni, u= A ly — f) and where the jth process has the weights {dy} 
given by the jth column of AX. The assumptions about F and X imply 
all the assumptions of Theorem 2.3.1. Hence (8) follows from (3.2.6), (5) 
(9) in an obvious fashion. 


Next, we state an analogous result for the L2-distances. 
Theorem 6.5.2. In addition to (1.1.1), Hs, (NX) and (5), assume that F 


in Hs and the integrating measure G satisfy (5.3.8), (5.5. 68), (5.5.70) and 
(5.64.13). Then, 


(10) KiB) = f[Wily) + 2M(y) n/?x,(B— A)’dG(y) + 0,(1), 


(11) K3(A) = f \|W*(y) + 2f(y) A “(B- A)ll"dG(y) + 051), 
where Wi(-), W'(-) now stand for Wi(-, B), W(-, A). 


Proof. The proof follows from two applications of Theorem 5.5.2, once 


with D=n/ ali 0, ..., 0} and once with D = XA. In both cases, take 
Yni and Fy; of that ‘theorem to be equal to enj and F,1<¢i1< a, 
respectively. The Claim 5.5.2 justifies the applicability of that theorem 
under the present assumptions. o 


The next result is useful in obtaining the a.n.d.’s of the m.d. test 
statistics. Its proof uses Theorem 5.5.2 and 5.5.4 in a similar fashion as 
Theorems 5.5.1 and 5.5.3 are used in the proof of Theorem 6.3.1, and hence 
no details are given. Let 

Kj := inf{K%(t); teR’}, j= 1, 2. 


Theorem 6.5.3. Assume that (1.1.1), Hs, (NX), (5.3.8), (5.5.68), 
(5.5.70) and (5.6a.13) hold. 


(a) If, in addition, (5.6a.10) and (5.6a.11) hold, then 


(12) Ki=af "{Wwily) - 407) f “WifdG ( f "2aG) }2aG + 0,(1). 


(b) Under no additional assumptions, 


oO 


(3) Ki=2f “Iwi(y) —4(y) f “W'TdG ( i "£AG)“I2dG + op(1). 
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To obtain the a.n.d.’s of the given statistics from the above theorem we 
now apply Lemma 6.3.1 to the approximating processes. The details will be 


given for K3 only as they are similar for Ki. Accordingly, let 
@ @ 

(14) Zn(y) = W'(y) —f(y) f W'tdG (f fdG)“, n21, y20. 
0 0 


To determine the approximating r.v. for K3 we shall first obtain the 
covariance matrix function for this Zn, the computation of which is made 
easy by rewriting Z, as follows. 

Recall the definition of y from (5.6a.2) and define 


@ 
ai(y) = Mery) +T(ers-y)—1, yeR, Gir= f aidy, 1<idn; 


OS (Cig sg Op)? OO = (OG ac. Gay 8 2= f Pac. 
Then 
(15) Zn(y) = AX [a{y) —f(y)aa™’], y20. 


Now observe that under Hs, Ea = 0, Ea;(x)a;(y) = 2 (1-F(y)), 0< x < 
y, and, because of the independence of the errors, 


(16) Ea(x)a(y) = 2(1-F(y)) Ipxp, O<x<y. 
Again, because of the symmetry and the continuity of F and Fubini, for y>0, 


Ea(y)a = ; “E[t(e1 ¢ y)+I(er < -y)—1 [Ter < x)+1(e1 < -x)-1] d(x) 
= } "[F (xAy)+F (-xAy)-F(y) + (xA-y)+F(-xA-y)-F(-y)) dy{x) 


= 1-Fy)){Hy) - HO} +f 21-F(@)) dv(x) 
=2 f(x) — HO)] aF(x) =: K(y), say. 


The last equality is obtained by integrating the second expression in the 
previous one by parts. From this and the independence of the errors, we 
obtain 


Eo(y)a’ = k(y) Ipxp, y20. 
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Similarly, 
fae ao @ 
EQQ’ =Ipxp4 ff (1-F(y)) dy(x)dWy) =: Ipxp 1(F,G), say. 
0 “x 
From these calculations one readily obtains that under H;, for 0<x<y, 
(17) K,(x, y ) = EZn(x)Zn(y) 
= [2(1-F(y)) — k(y)f(x)a“— k(x)f(y)a * + t(F,G)]Ipxp. 


We also need the weak convergence of W' to a continuous Gaussian 
process in uniform topology. One way to prove this is as follows. By (16), 


(18) EW*(x)W'(y) = 2(1—F(y)) Ipsp, O<x<y, 
From the definition (6.1.9) and the symmetry of F, 
(19)  W'(y) = 34 AxnifI(eni < y) - I(-eni < y)} 
= Yi Axni{I(eni ¢ y) - F(y)} - 2i Axni{I(-eni < y) - F(y)} 
+ %y Axni(-eni = y) 
(20) = Wy) — Wly)+%i Axnil(-eni=y), say, y 20. 
Now, let W’ := (M, , Wp) be a vector of independent Wiener 


Wo, «.. 
processes on [0, 1] such that (0) = 0, EV =0, and EW;(s)W;(t) = sAt, 1<j<p. 
Note that 


EM(2(1-F(x)))M2(1-F(y)))’ = 2(1-F(y)) Ipxp, OS Sy. 
From (18) and (19), it hence follows, with the aid of the L-F CLT and 


the Cramer—Wold device, that under (NX), all finite dimensional 


distributions of W" converge to those of #(2(1-F)). 

To prove the tightness in the uniform metric, proceed as follows. From 
(20) and the triangle inequality, because of (NX), it suffices to show that % 
and ® are tight. But by the symmetry and the continuity of F, 


{ Hy), yeR} = { Wily), yeR} = { H(F “(t)), O<ts1}. 


But, W(F *) is obviously a p-vector of w.e.p.’s of the type Wa specified at 
(2.2a.33). Thus the tightness follows from (2.24.35) of Corollary 2.2a.1. We 
summarize this weak convergence result as 


Lemma 6.5.1. Let F be a continuous d.f. that 1s symmetric around 0 
and {eni, 1<i<n} beiid. F r.v.’s. Assume that (NX) holds. Then, 
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W'(-) 3 M2(1-F(-))) in (D[0, o], 2). o 


(16) The above discussion suggests the approximating process for the Z, of 
16) to be 


(21) Z(y) := W2(1—-F(y)))y) J “W(2(1-F))fdG ( J "PaGy 4, y20. 


Straightforward calculations show that K,(x, y) = EZ(x)Z(y), 0<x<y, n>1. 
This then verifies (i), (ii) and (iv) of Lemma 6.3.1 in the present case. 
Condition (iii) is veritied as in the proof of Corollary 6.3.1(b) with the help of 
Lemma 6.5.1. To summarize, we have 


Corollary 6.5.1. (a) Under the conditions of Theorem 6.5.3(a), 
0 @ a 
(22) Ki—2f [W,(2(1-F(y)))-(y) f m(2(-F))taG (f PaG)“Paa(y). 
0 0 0 
(b) Under the conditions of Theorem 6.5.3(b), 


@ 
(23) K} 2 f \\2l’dG(y), with Z given at (21). a 
0 


Remark 6.5.1. The distributions of the limiting r.v.’s in (22) and (23) 
have been studied by Martynov (1975, 1976) and Boos (1982) for some F 
and G. An interesting G in the present case is G = A. But the 
corresponding tests are not a.d.f.. Also because the F in Hg is unknown, 
one can not use G = F or the Anderson—Darling integrating measures dG 
= dF/{F(1—F)} in these test statistics. 

One way to overcome this problem would be to use the signed rank 
analogues of the above tests which is equivalent to replacing the F in the 
integrating measure by an appropriate empirical of the residuals {Ynj-xnju; 


1<j<n}. Let Riy denote the rank of lYni-xniu| among {| Yuj-xnjul; 1 
€j<n}, 1<i<n, and define 

Z(t, u) := n 1/25, I(Riu < nt) sen(Yni-XniU), 

Z(t, u) := A Yi Xni I(Riu < nt) sen(Yni-Xnid), O<t<l,ueR. 


The signed rank analogues of Ki, Ky statistics, respectively, are Ks = 
inf{X,(u); uweR°}, Ko = inf{X.(u); uceR’}, where 


Ku) = fT2i(t, WP Abt), he(u) = f]44(t, w)|PaL(t), we We, 
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with Le D7Z[0, 1). If L(t) =t then Ki, j =1, 2, are analogues of the 
Cramer—Von Mises statistics. If L is specified by the relation dL(t) = 
{1/t(1-t)}dt, then the corresponding tests would be the Anderson—Darling 
type test of symmetry. 


Note that if in (3.3.1) we put dpi = nt/2 Xni = €ni, Fni = F, then Za 
of (3.3.1) reduces to Zj. Similarly, Z; corresponds to a p-vector of 


Za-processes of (3.3.1) whose jth component has dy; = (jth column of A)’xni 
and the rest of the entities the same as above. Consequently, from (3.3.17) 
and arguments like those used for Theorem 6.5.3, we can deduce the 
following 


Theorem 6.5.4. Assume that (1.1.1), Hs and (NX) hold; L is ad.f. on 
(0, 1], and F of Hs satisfies (F1), (F2). 


(a) If, in addition, (5.6a.10) and (5.6a.11) hold, then 
(24) Kf Unt) a(t) fm aPab (f (a"Pat) *Pat(e) 


(b) Under no additional assumptions, 


s Sree wee Meaty 2 a7 112 
(25) i —> f IMt)—a (t)f Waal (f (a )"dl) “Ia (t), 
where q (t) := 2{f(F 1((t+1)/2) —£(0)], 0<t¢1. o 
Clearly this theorem covers L(t) =t case but not the case where dL(t) 


= {1/t(1-t)}dt. The problem of proving an analogue of the above theorem jor 
a general L is unsolved at the time of this writing. oo 


CHAPTER 7 


AUTOREGRESSION 


7.1. INTRODUCTION. 


The purpose of this chapter is to offer a unified functional approach to some 
aspects of robust estimation and goodness-of-fit testing problems in pth 
order autoregression (AR(p)) models. This approach is similar to that of the 
previous chapters in connection with linear regression models, thereby 
extending a statistical methodology to one of the most applied models with 
dependent observations. 

As before, let F bead-f. on R, p > 1 be an integer, €1, €9,.... be iid. 
F r.v.’s and Yo := (Xo, X-1, ..., X1-p)’ be an observable random vector 
independent of €;, €2, ..... In an AR(p) model one observes {X;} satisfying 


(1) Xj = prXj-1 te +++ PpXi-p + Ei, 1<i¢n, pol, 
for some p’ = (fj, pa, --- » Pp) € R°. 


Processes that play a fundamental role in the robust estimation of p in 
this model are the randomly weighted residual empirical processes 


_4 7 , 
(2) T;(x, t):=n af g(Xi;) I(Xi—t Yia<x), xeR, teR, 1< j<p, 
12 


where g is a measurable function from R to Rand Yj-; := (Xj-1,..., Xi-p)’, 1 
<i¢n. Let T:= (Ti, ..., Tp)’. 

The generalized M- (GM) estimators of p, as proposed by Denby and 
Martin (1979), are solution t ofthe p equations 


(3) Gi(t) = f Wx) Ti(dx, t) = 0, 1< jp, 


where w is a nondecreasing bounded measurable function from R to R. These 
estimators are analogues of M-estimators of # in linear regression as 
discussed in Capter 4. Note that taking g(x) = xI[|x| <k]+ kif x| > kj = 
Wx) in (3) gives the Huber(k) estimators and taking g(x) =x = y(x) gives 
the famous least square estimator. 


The m.d. estimator pz, that is an analogue of 6, of (5.2.20) is defined 
as a minimizer, w.r.t. t , of 


(4) Ke(t)= 3 fn 7B axis {MK &x +t’ Yin) 
-I(-Xj<x-t Yj.)}]? dG(x), te RP. 
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Observe that K involves T. In fact, V t € RP, 


Ket) = 3 finn i(x, t) — 3 a(Xia) + TH(-x, 0H? acl), 


Three members of this class of estimators are of special interest. They 
correspond to the cases g(x) = x = G(x); g(x) = x, G = 6, the measure 
degenerate at 0; g(x) =x, G=Fin the F known case. The first gives an 
analogue of the Hodges—Lehmann (h.1.) estimator of p, the second gives the 
least absolute deviation (l.a.d.) estimator, while the third gives an estimator 
that is more efficient at logistic (double exponential) errors than l.a.d (h.1.) 
estimator. 

Another important process in the model (1) is the ordinary residual 
empirical process 


(5) F,(x, t) =n) Dy 1(Xy—t Yi <x), xeER,teR?, 


An estimator of F ora test of goodness-of-fit pertaining to F are usually 


based on F,(x, p), where p is an estimator of p. 
Clearly Fy is a special case of (2). But, both F, and Tj, 1<j< p, 
are special cases of 


(6) Wha(x, t) := n} Yy h(Yi-1) 1X4 — t Yi. < x) 
=n !¥; h(¥i4) (ei <x+(t- p) Yi-), xe R, te RP, 


where h is a measurable function from R’ to R. Choosing h(Y;-:) = g(Xi-;) 
in Wh gives Tj,1¢j<p and the choice of h=1 yields Fp. 


From the above discussion it is apparent that the investigation of the 
large sample behavior of various inferential procedures pertaining to p and 


F, based on {Tj} and F,(-, p), is facilitated by the weak convergence 


properties of {Whu(x, pent *u), x€R, ucR?}. This will be investigated in 
Section 7.2, with the aid of Theorem 2.2b.1. In particular, this section 


contains an a.u.l. result about {Wz(x, pn *n), x€R, |jul] < B} which in 
turn yields a.u.l. results about {T(x, pen //u), xeR, |Jul] < B} and {Fa(x, 


p+n t/ *a), xéR, |/u|/ < B}. These results are useful in studying GM- and R- 
estimators of p, akin to Chapters 3 and 4 when dealing with linear regression 
models. They are also useful in studying the large sample behaviour of some 
tests of goodness-of-fit pertaining to F. Analogous results about the 
ordinary empirical of the residuals in autoregressive moving average models 
are briefly discussed in Remark 7.2.4. 

Generalized M-estimators and analogues of Jaeckel’s (1972) 
R-estimators are discussed in Section 7.3. In order to use R- or m.d. 
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estimators to construct confidence intervals one often needs consistent 
estimators of the functional Q(f) of the error density f. Appropriate 
analogues of estimators of Q(f) of Section 4.5 are shown to be consistent 
under (F1) and (F2). This is also done in Section 7.3, with the help of the 


a.u.l. property of {Fn(x, pen l/ ‘u), xeR, |a<B}. This result is also used to 
prove the a.u.l. of serial rank correlations of the residuals in an AR(p) model. 
Such results should be useful in developing anlogues of the method of 
moment estimators or Yule-Walker equations based on ranks in AR(p) 
models. 


Section 7.4 investigates the behaviour of two classes of m.d. estimators 


of , including the class of estimators {p,}. A crucial result needed to 
obtain the asymptotic distributions of these estimators is the asymptotic 
uniform quadraticity of their defining dispersions. This result is also proved 
in Section 7.4. Section 7.5 contains appropriate analogues of some of the 
goodness-of-fit tests of Capter 6 pertaining to F 


7.2. ASYMPTOTIC UNIFORM LINEARITY OF W;, and Fy. 


Recall the definition of Vn process from (1.4.1) and the statement of 
Theorem 2.2b.1. In (1.4.1), let 


(1) Cai = i, ni = h(Yi-), bni= na’ y;_,, ueER?, 1<i¢n, 
An = o-field {Yo}, Ani = o-field {Yo, €1,...,e-1}, 2<i¢<n. 


Then one readily sees that the corresponding V}(x), Vi(x) are, respectively, 
equal to Wy(x, pen 1 a), Wi (x, p) for each ue R? and for all x € R. 
Consequently, if we let 
(2) u(x, t) = n° Bi b(¥i) F(x + (t—p) Yi), 

W(x, t) := nl?) Wax, t) — n(x, t)], xER, te R, 


then the corresponding U}(x), U(x) are, respectively, equal to *#(x, p+ 


n/ *n), W(x, p) for each weR? and for all xeR. Recall the conditions (F1) 
and (F2) from Corollary 2.3.1. We are now ready to state and prove the 
following 


Theorem 7.2.1. In addition to (7.1.1), assume that the following 
conditions hold: 


(a1) h is a bounded function. 
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(a2) wag [¥i-ll = op(2). 
(a3) sw fh(¥i4)Yi-al] = Op(2). 
(a4) F satisfies (F1) and (F2). 


Then, for every 0< B <a, 


(3) 


and 


sup | W(x, pn /"u) — K(x, p)| = op(1), 
x€R, | u|| <B 


(4) n/ 21 W(x, pn if 24) —W,i(x, p)] = — wn {Sy h(¥i-1) Yi-1 f(x) + 0p(1). 


where Op(1) ts a sequence of stochastic processes that converges to zero, 
uniformly over the set xéR, ||ul|<B, in probability. 


Proof. In view of the discussion preceeding the statement of the 
theorem it is clear that (2.2b.2) of Theorem 2.2b.1 applied to entities given 
in (1) above readily yields that 


sup{| #(x, pent *u) — (x, p)|; x€R} = op(1) for every fixed ue R?. 


It is the uniformity with respect to u that requires an extra argument and 
that also turns out to be a consequence of another application of (2.2b.2) and 
a monotonic property inherent in these processes as we now show. 

Since h is fixed, it will not be exhibited in the proof. Also, for 


convenience, write W(-), %(-), Wa (-), val-) etc. for KH(-, p), Hl-, 
an t/ *a), Wil(-, pent! 7a), Vi(-, pn / 7a) etc. with + signifying the fact 
that h” now appears in the place of h in these processes where h* = OVh, 
h-=h -h*. To avoid displays being broken into different lines often, write 
€3, hi, hi for Yi-1, h(Yi-1), h*(Y3-1), respectively, i>1. Thus, e.g., 

(5) We (x) =n /?5; be {Me <xen /? a &) — F(x enw &}. 


We also need the following processes: 


(6) T*(x; u,a):=n /? She eqsxen Pw gen allel), 
m*(x; u,a):=n/? Shi F(x en we Gon alll) 
+ 


Z = T* —m’*, x€R, ue acR. 
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Observe that ifin Un of (2.2b.1) we take (pn; = €i, hni= h*(&;), bni= 
nV 2 Ej+al| €:|[} and Ani, 1 <i <n, as in (1), we obtain 


Un(-)=Z'(-;u,a),  forevery we R?, acR. 


oe if we take 6ni = n Vy’ €; and the rest of the quantities as above 
then 


Ui(-) = Z(-; u, 0) = (5); for every u€ R’. 


It thus follows from two applications of (2.2b.2) and the triangle inequality 
that for every uc Rac, 


(7a) supx |Z (x; u, a) - Z (x; u, 0)| = 0,(1), 
(7b) supx | % (x) - W*(x)| = op(1). 


Thus, to prove (3), because of the compactness of M(B), it suffices to show 
that forevery «€ >0 thereisa 6>0 such that for every ||ull < B, 


(8) lim supn P( | W(x) - W(x)| > 46) < . 


sup 
5B, | ul], 


By the definition of W and the triangle inequality, for xR, s, uéeR?, 
(9) | W(x) - Walx)| < | Ve'(x) - H'(x, w)| + | K(x) - H(A, 
| We (x) - We (x)| < n!/?[| W(x) - Wa(x)| + [v5(x) - va(x)|]- 
But s €4(B), ||ul| < B, |[s —ul] < 6 imply that for all 1<1¢n, 
Qo) n/a g-n /6 el on 7s <n Ge ail. 


From (10), the monotonicity of the indicator function and the nonnegativity 
of h*, we obtain 


T*(x; u, -6) - T*(x; u, 0) < W(x) - Wu(x) < T’(x; a, 6) - T’(x; u, 0) 


for all x € R, s € M(B), ||s—ul| < 6. Now center T* appropriately to obtain 


214 AUTOREGRESSION 1.2 


(11) n’/? | W5(x)-Wa(x)| 
< |Z°(x; u, 8) - 2° (x; w, 0)| + |Z (x; u, -6) - 2°(x; u, 0)| 
+ | m= (x; u, 6) - m* (x; u, 0)| + | m* (x; u, -6) - m°* (x; u, 0)|, 
for all xeR, s € M(B), ||s - ull < 6. 
But, by (a4), V |{ull < B, 
(12) sup, |m*(x; u, #6) - m°(x; u, 0)| ¢ dllfl|_ mn Dillséill, 


1 con 
(13) sup n/|v5(x) - va(x)| ¢ dlifl,, m Ballhséill 
{| s_ull< 6,x 
From (12), (11), (7a) applied with a= 6 and a= -—6é6 and the 
assumption (a3) one concludes that for every ¢€ > 0 there is a 6 > 0 such 
that for each |lull < B, 


lim supp P( sup n/?| w(x) - Wa(x)| > €) < €/2. 


s—uli< 5,x 


From this, (13), (9), and (a3) one now concludes (8) in a routine fashion. 
Finally, (4) follows from (3) and (a4) by Taylor’s expansion of F. O 


An application of (4) with h(Yi-1) = g(Xi-;) and the rest of the 
quantities as in (1) readily yields the a.u.l. property of Tj—processes, 1<j<p 
of (7.1.2). This together with integration by parts yields the following 
expansion of the M-scores Gj, 1<j<p_ of (7.1.3). 


Corollary 7.2.1. In addition to (7.1.1), (a2) and (a4), assume that the 
following conditions hold. 


(b1) g is bounded. 
(b2) w is nondecreasing, bounded and f w dF = 0. 
(b3) n* 3; [lg(Xi5)¥i-ll = Op(1), 1¢ <p. 


Then, V 0<k,B<a, 
sup|n’/?|g;(p + n1/2q) — Gi(p)] - wn) Yy g(Xi)¥i-f fay = 0/(1) 


where the supremum is taken over all p with ||Y|ltv< k < o, |lul| < B, 1<j<p. a 
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Upon choosing h = 1 in (4) one obtains an analogous result for the 
ordinary residual empirical process F(x, t). Because of its importance and 
for an easy reference later on we state it as a separate result. Observe that 
in the following corollary the assumption (a3*) is nothing but the assumption 
(a3) of Theorem 7.2.1 with h = 1. 


Corollary 7.2.2. Suppose that (7.1.1) holds. In addition, assume that 
(a2), (a3*) and (a4) hold, where, 


(a3*) na Yall = Op(2). 
Then, for every 0< B <a, 
1/2 -1/2 , -4 
(14) sup[n °{F a(x, ptn “°u) — Fa(x, p)}—u n Dy Yi-1 f(x)| = op(1), 
where the supremum is taken over x € R, |lul| < B. oO 


Remark 7.2.1. Observe that non of the above results require that the 
process {X;} be stationary or any of the moments be finite. o 


Remark 7.2.2. Consider the assumptions (a2) and (a3). If Yo and 


{¢;} are so chosen as to make {X;} stationary, ergodic and if E(||Yoll?+ é)<a 
then (a2) is a priori satisfied and (al) implies (a3). See, e.g., Anderson 
ah 1; p 203). In paricular, (a3) holds for the h corresponding to the Huber 
unction h(x) = ik |x| < k) + sign(x)I(|x| >k), k > 0. 

Of course if (a1) holds with the function h bounded in such a way that 
puts zero weight outside of compacts then (a3) is trivially satisfied. 

Observe that (a2) is weaker than requiring the finiteness of the second 
moment. To see this, consider, for example, an AR(1) model where Xo and 
€1, €2, --. are independent r.v.’s and for some |p| < 1, 


Xj = p Xiat i; i21. 
Then, 
i en 
X; =p’ Xo +o p *€, YX: i>. 
j= 
Thus, here (a2) is implied by 


(i) max n /*1e:] = 0,(1). 


14 iSn 


But, (2) is equivalent to showing that xn {1—P(|€,|>x)} 0 as x— a, 
which, in turn is equivalent to requiring that x"P(| €,|>x) 490 as x— om. 


This last condition is weaker than requiring that E| é|7 <w. For example, 
let the right tail of the distribution of | €,| be given as follows: 
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P(| €1|>x) = 1, x <2, 
= 1/(x* &n x), x > 2: 
Then, E| €,| <a,Ee=a, yet xP(| €1]>x) 0 as X—o. O 


Remark 7.2.3. An analogue of (14) was first proved by Boldin (1982) 


requiring {X;} to be stationary, Ee, = 0, E(€4) <o and a uniformly 
bounded second derivative of F. The Corollary 7.2.2 is an improvement of 
Boldin’s result in the sense that F needs to be smooth only up to the first 
derivative and the r.v.’s need not have finite second moment. 

Again, if Yo and {e¢;} are so chosen that the Ergodic Theorem is 


applicable and E(Yo) = 0, then the coefficient n ‘Y; Y;-; of the linear term 
in (14) will converge to 0, a.s.. Thus (14) becomes 


(14*) supp yg [a {Fa(x, pen’ /2u) — Fa(x, p)}| = op(1). 


an 


In particular, this implies that if p is an estimator of p such that 


In /7(p — p)|| = O,(1), 
then 


a7 {F (+, 6) — Fal, p)Fll, = Op(1). 


Consequently, the estimation of p has asymptotically negligible effect on the 
estimation of the error d.f. F. This is similar to the fact, observed in the 
previous chapter, that the estimation of the slope parameters in linear 
regression has asymptotically negligible effect on the estimation of the error 
at as long as the design matrix is centered at the origin. o 


An important application of (14) occurs when proving the a.u.l 
property of the serial rank correlations of the residuals as functions of t. 
More precisely, let Rit denote the rank of X;-t’ Yj-; among Xj-t’ Yj-1, 1<j<n, 
l<i<n. Define Rig = 0 for i<O0. Rank correlations of lag j, for 1<j<p, are 
defined as 


(15) si =a (Rije—GED(Re-GED, ter, 
n(n°-1) **J* 


S := (Sj, ..., Sp). 


Simple algebra shows that 
Sj(t) = an[L;(t) — n(n+1)2/4] + by;(t), 1<j<p, 
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where ay is a nonrandom sequence not depending on t, [an] = O(1), 


n j 
baj(t) = ED (f+ Bh) Ris 


{n(n?—1)} “nie 
and 


_9 Nn 
L,(t):= n on Ri-jt Rit, 1<j<p, ter. 


Observe that sup{|bn;(t)|; teR?} < 48p/n, so that ney {|bnj(t)]; teR?} 
tends to zero, a.s. It thus suffices to prove the a.u.l. of {Li} only, 1 < j< p. 
In order to state the a.u.l. result we need to introduce 


(16) Zaj -= f(eij)F(ei) + flei)F( ei), i> j, 
= 0, 1 < j. 
Uaj := Yij-a F(ei)f(eij) + Yi- fle) F(a), i> j, 
= 0, 1< j. 
-{ 2 -{ fA : 
Gian % Bi, Ujan', & Uy, 1<j<p. 


Yn = n? Yi 
iz 


Observe that {Z;;} are bounded r.v.’s with EZ; = f f(x) dx for all 
i and j. Moreover, €;4 iid. F imply that {Z;j, j<i<n} are stationary and 
ergodic. By the Ergodic Theorem 


Z; — b(f) := f f(x) dx, as, j=1,..,p. 
We are now ready to state and prove 


Theorem 7.2.2. Assume that (7.1.1), (a2), (a3*) and (a4) hold. Then, 
for every 0<B<wom and for every 1<j<p, 


(17) supy gy ey [m7 LL j(oen 2a) -Li(o)] — w [b(£)¥n — Uj] = op(1). 


If (a2) and (a3*) are strengthened to requiring E(|!Yoll? + 4) <o and {X;} 


stationary and ergodic then Yy and Uj; may be replaced by their respective 
expectations in (17). 


Proof. Fix a j in 1<j<p. For the sake of simplicity of the exposition, 
write L(u), L(0) for L;(pen a), Lj(p), respectively. Apply similar 


convention to other functions of u. Also write ei, for ej — ni/ an Yue 
and F,(-) for Fn(-, p). With these conventions Riy is now the rank of 
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Xi- (pn */ *a) Yi-t = €iy. In other words, Riy = nF ,(€iy, u) and 
_¢4n 
L(u) =n - A F(€i-ju; u) F,(€iu, u), Ue RP. 


The proof is based on the linearity properties of F,(-, u) as given in (14) of 
Corollary 7.2.2 above. In fact if we let 


B,(x, u) := F,(x, u) — Fp(x) —n //?a Yy f(x), xeER. 
then (14) is equivalent to 
sup n’/?|B,(x, u)| = 0p(1). 


All supremums, unless specified otherwise, in the proof are over xeéR, 1<i<n 
and/or ||ul| < B. Rewrite 


n/?(L(u) — 1(0)) 
= ny iF aléiiu, u) Fa(eiu, u) — Fa(eij) Fn(ei)} 
= ny Bal €i-iu: u) + F(€i-ju) + n/20’Y,, f(€i-ju)} E 


-{Bn(€iu, U) + Fa(€iu) +2 /2u’ Vn f(€in)} 
- Fa(eij) Fa(es))- 
Hence, from (14), (a2) and (a3*), 
(18) n’/?(L(u) — L(0)) 
=n 99) [Fa(esju) Fa( cia) — Fa(es) Fa(€is)] 


+ nS, [Fa( cin) €iu) + Fa(€iu)f(€i-ju)](u’ Yn) + Op(1), 


where, now, 0),(1) is a sequence of stochastic processes converging to zero 
uniformly, in probability, over the set MB). 

Now recall that (a4) and the asymptotic uniform continuity of the 
standard empirical process based on i.i.d. r.v.’s imply that 


sup n/?|[Fa(x) — F(x)] — [Fa(y) — F(y)]| = op(1) 
| x-y <6 


when firsts n+m andthen 6+0. Hence from (a2) and the fact that 


1.2 A.U.L. OF Why AND Fy 219 


supisu | €iu - ei] < Bn //max; || Yi-1l) 
one readily obtains 
supisu 2” | [Fa(eiu) — F(¢iu)] — [Fn(¢s) — F(€:)]| = op(1). 
From this and (a4) we obtain 
1/2 “1/2,” = 


From (18), (19), the uniform continuity of f and F, the Glivenko—Cantelli 
lemma, one obtains 


(20) n’/?(L(u) — (0) 
=n 1X [F(6:i) Aes) + Fle) f(ei9)](u Ya) 
_ und (A ¥i--sf(€15)F(6:)*¥i-1 f(es)F(€:5)} + 5p(1). 
In concluding (20) we also used the fact that by (a2) and (a3*), 
sup [a7 ¥ Jw Yigew Yiu) ¢ Ba ?max ||¥iul nS Yi] = op(1). 


Now (17) readily follows from (20) and the notation introduced just before 
the statement of the theorem. The rest is obvious. Oo 


Remark 7.2.4. Autoregressive moving average models. Boldin (1989) 
and Kreiss (1991) give an analogue of (14*) for a moving average model of 
order q and an autoregressive moving average model of order (p,q) 
(ARMA(p,q)), respectively, when the error df. F has zero mean, finite 
second moment and bounded second derivative. Here we shall illustrate as to 
how Theorem 2.2b.1 can be used to yield the same result under weaker 
conditions on F. For the sake of clarity, the details are carried out for an 
ARMA(1,1) model only. 

Let €0, €1, €2, ---, bei. F r.v.’s and Xo be ar.v. independent of {€;, 
i> 1}. Consider the process given by the relation 


(21) Xj = pXiat et B ein, i> 1, 

where |p| <1, |6| <1. One can rewrite this model as 

(22) é; = X,—(pXo + feo), i= 1, 
= Xi E (Piet B) Xi51t (A) (pKa + Bea), 22 
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Let 9 := (s, t)’ denote a point in the open square (—1, 1)2 and & := (p, 
6)’ denote the true parameter value. Assume that @s are restricted to the 
following sequence of neighborhoods: Fora be(0, o), 


(23) a/{|s—p| + |t— Al} <b. 
Let {€;, i> 1} stand for the residuals is i>1} of (22) after p and 


6 are replaced by s and t, respectively, in (22). Let Fy(-, 6) denote the 
empirical process of {€;, 1 <i <n}. This empirical can be rewritten as 


(24) F,(x, 6) = o's I(e:<x + bni), xeR, 
where 
(25) eee Cee) oe eee ee pe. 


i-2 
= YAP (6+t) — (BP (0+ 8)] Xi 
a (—t)i*! (sX + t€o) — (—p)*" (pXo + Be), 1> 2. 
= bnit + bnia, say, 1 > 2. 
From (25), it follows that for every @€ (—1, 1)? satisfying (23), 
|6a1| < bn” /*(|Xo] + leol), 
-1/2 -1/2 a1 -1 
max, | bai] ¢2b 0”? mag |Xs| (1— ba /? — A) {1 + (1-[6]) “}, 
-1/2 1/2 gy-1 
pax, |Saial ¢ 2n- (1 — ba 4? — B) *(|Xo] + | ¢0]). 
Consequently, if ni/ * may | X;| = op(1), then the {6,3} of (25) would 
2» Inn 
satisfy (2.2b.A2) for every 0€ (—1, 1)2. But by (21), 
(26) Xj = pXo + Beg + €1, i= 1, 
i-2 
= p''(pXot Leo) a 2 p(p+f) €j-j-1 + €i, 1> 2. 
Therefore, (2.2b.A2) will hold for the above {6,;} if 


~1/2 = 
(27) no BS, | ei] = op(1). 
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We now verify (2.2b.A3) for the above {6,;} and with hpjj=1. That is 
we must show that n 1/2 | dni] = Op(1). We proceed as follows. Let u = 


1/2 1/2 — 
n’“(s—p),v=n/“(t —f) and Zo := |Xo| + |eo|. By (23), Ju] + |v < 
b. From (25), | | Te 


(28) n 1/2 | bni| <n bZo 
“1/25 1S 4)i _ 
+0 Pl a) (st+t) — (—8)!(p+f)] Xi-j-1| 
+n? ¥ \(A)EeXo + ter) — (A) ZI, 


= Ant + Ano + Ana; say. 
Clearly, | An:| = 0(1), a.s. Rewrite 


_ 1/28 Ny; -1/2 j 
Ana = 0/73 | SICA)i{(utvyn/74- p46} — (—B)i(0+8)] Xi 
-j ni-2 j -1/2 3 1-2 % ° 

€ bn HY, LET Ragal +20 BL BAY — (BPI 
(29) = 2bAna1 5 2An29, say. 
By a change of variables and an interchange of summations one obtains 

-19 = 

(30) Ana $n 3 [Xi] (1-|t|) 


eee ost j-1 . 
Next, use the expansion aj — ci = (a — c) a aj-!-k ck for any real numbers 
a, c, to obtain 
-1 n 1-2 j -1 4. 
Anjo < yY DY [tt | Bl Xys]. 
n22< bn ees | | B| | i-j-1| 

Again, use change of variables and interchange of summations repeatedly and 
the fact that il v it! < 1, to conclude that this upper bound is bounded 
above by 


b(1 —[4]) [1 —|t])* + tad |Xil. 


This, (28) and (29) together with (23) imply that 
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(31) 9 Ang$ 2b ¥ [Xi] (bn /?-1 81) “14+ -[ 8) 4} 
-1 
+ (1-|6|) |]. 
Finally, similar calculations show that 


(32) Ans = Op(n 2/2). 


From (28), (31) and (32) it thus follows that if oY Xi = 0,(1), 
1= 


then the {éni} of (25) will satisfy (2.2b.A3) with hn; =1. But in view of 
(26) and the assumption that |p|V|G| < 1, it readily follows that if 


_71n 
(33) n*» |ei] = Op(1), 


then oo with hy; = 1 holds for the {dni} of (25). We have thus 
proved the following: 

If (21) holds with the error df. F satisfying (F1), (F2), (27) and (33), 
then V #€ (-1, 1)?, 


sup,|n 1/? 3 {Ue <x) —I(ei <¢ x) — F(x+6ni) + F(x)}| = op(1). 


Now use an argument like the one used in the proof of Theorem 7.2.1 to 
conclude the following 


Corollary 7.2.3. In addition to (21), assume that the error d.f. F 
satisfies (F1), (F2), (27) and (33). Then, V0 <b <a, 


sup |n’/ [Fa (x,6) — Fn(x,00)] —n 2/? 3; bn: £(x)| = op(1), 


where the supremum is taken over x€R and 0, 0) satisfying (23). 
If (33) is strengthened to assuming that E|e€| < o, then 


sup|n/? ¥ bys — mi [(s-p)(1-9) | + (+-8)(14) “I wl = op(2), 


where the supremum is taken over s,t satisfying (23) and p= Ee. a 


Consequently, if Ee = 0 and (f, f) is an estimator of (p, 8) such that 


\}n*/ *(p-p, B-B)|| = O)(1), then an analogue of (14*) holds in the present 
case also under weaker conditions than those given by Boldin or Kreiss. 
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The details for proving an anlogue of Corollary 7.2.3 for a general 
ARMA(p,q) model are similar but some what complicated to those given 
above. Oo 


7.3. GM- and R- Estimators. 


In this section we shall discuss the asymptotic distributions of GM- and R- 
estimators of p. In addition, some consistent estimators of the functional 
Q(f) will be also constructed. We begin with 


7.3a. GM-Estimators. 


Here we shall state the asymptotic normality of the GM-estimators. Let Py 


stand for a solution of (7.1.3) such that Ijn1/ (by — p)|| = O)(1). That such 


an estimator p, exists can be seen by an argument similar to the one given 


in Huber (1981) in connection with the linear regression model. To state the 
asymptotic normality of py we need to introduce some more notation. Let 


Xo X -4).-6-)X1-p g(Xo) g(X-1),....8(X1-p) 
Xj Xo, asia »X2-p g(X1) g(Xo),.-.-,8(X2-p) 
° . . $= . . ° . 


(1) #= 
g(Xn -1) g(Xn -2)). : 18(Xn -p) 


Xn -1 Xn-2,...,AXn-p 


G :=n'/"(G,(p),.-) Gp(p)), Bur= ¥ B= Yi (e(Xia)Vi-..., B(Xi-p)Yi-) 


Proposition 7.3.1. In addition to (7.1.1), (7.2.a2), (7.2.a4), (7.2.b1), 
(7.2.b2) and (7.2.b3) assume that 


(b4) n‘B, = B + Op(1), for some pxp non—random positive 
definite matriz B. 


Then 
1/2 =1 
n!?(p4 — p) = — (Ba) °G + 0p(1). 
If, in addition, we assumes that 


(b5) n'¢g g= Gtt+ op(1), G* a pxp non-random positive 
definite matriz, 


then 
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a (p—p)— N(0,I),  S:= (ffdy) *-(fy? dF)-B GB. 


Proof. Follows from Corollary 7.2.1, the Cramer—Wold device and 
Lemma A.3 in the appendix applied to 9. oO 

Again, if Yo and {ej} are so chosen as to make {X;} stationary, 
ergodic and E IYoll?-+€4) <a then (b4) and (b5) are a priori satisfied. See, 
e.g., Anderson (1971; p 203). 


Note: For a more general class of GM—estimators see Bustos (1982) where a 
result analogous to the above corollary for smooth score functions % is obtained. 0 


7.3b. R-Estimators. 


This section will discuss an analogue of Jaeckel’s (1972) R-estimators of p 
and their large sample properties. 

Recall that Rit is the rank of Xj—t’Yj-. among {X,—t’Y,4, 1< 
k<n}, for 1<i<n. Also, Rig = 0 for i< 0. Let y be a nondecreasing 
score function from [0, 1] to the real line such that 


(1) 3. yi/(n+l)) = 0. 


For example, if y(t) = — y({1-t) for all t €[0. 1], i-e., if pis skew symmetric, 
then it satisifies (1). Define 
_1 n 
S;(u) :=n so) Aj ARiy /(n+1)), 1<j<p, we R, 
1=)+ 
S — (Si, eeey Sp). 


The class of rank statistics S, one for each y, is an analogue of the class of 
rank statistics discussed in Section 4.3 above in connection with linear 
regression models where one replaces the weights {X;;} by appropriate 
design points. A test of the hypothesis p= pp) may be based on a suitably 
standardized S{po), the large values of the statistic being significant. 

It is thus natural to define R-estimators of p by the relationship 


(2) j, = arg min{||S{t)||; t € RP}. 
An alternative way to define R-estimators of p is to adapt Jaeckel 
(1972) to the AR(p) situation. Accordingly, for a teR?, let 


Zi(t) = Xy - t’ Yu-1, 1 <¢k <2, 
Zi) (t) := the ith largest residual among {Z,(t), 1<k<n}, 1<i<n, 
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R-Estimators 


n n 
At) = 3 pli/(nel)) Zc (t) = 3 o(Rie/(m+1))(Xi — Yi), 
Then Jaeckel’s estimator Pp; is defined by the relation 
p, = arg min{ A(t); t € RP}. 


Jaeckel’s argument about the existence of an analogue of p, in the 


context of linear regression model can be adapted to the present situation. 
This follows from the following three lemmas, the first of which is of a 
general interest. 


Lemma 7.3b.1. Let dj, do, ..., dn, V1, V2, ---» Vn, be real numbers such 
that not all {di} are the same and no two {vi} are the same. Let riy 
denote the rank of vi;—ud; among {vj —udj;1<j<n},ueR. Let {b,(i); 
1<i<n} bea set of real numbers that are nondecreasing in i. Let 


T(u) =) di ba(riu), ue. 


Then, T(u) is a nonincreasing step function in all those ueR for which 
there are no ties among {vj —udj; 1< j< n}. 


Proof. See Theorem II.7E, p35 of Hajek (1969). o 


Lemma 7.3b.2. Assume that the model (7.1.1) holds with (Yo, X1, Xa, 
...» Xn) having a continuous joint distribution. Then the following hold. 


(a) For each realization (Yo, X1, X2, ..., Xn), the assumption (1) implies 
that f(t) is nonnegative, continuous and convex function of t with tts 
a.e. derivative equal to —nS(t). 

(b) If the realization (Yo, X41, Xo, ..., Xn) 18 such that the rank of & is p 


then, for every 0<b <a, the set {teR?; Y(t) < b} is bounded, where 
% 1s the & of (7.3a.1), centered at the origin. 


Proof. (a). For any x’ = (x1, Xa, ..., Xn)ER", let x(1)<x(2)<....<x(n) 
denote the ordered xj, X2, .... Xn. Let I := {x = (m, m, ..., Mm)’; wa 
permutation of the integers 1, 2, ..., n.}, bn{i) := y{i/(m+1)), 1 <i <n, and 
define 


D(x) := Dy ba(i) x(i), D(x) = by ba(i) x x ER, 


Ty 
k := min{1l<j<n; ba(j) > O}. 
Observe that 4(t) = D(Z(t)). 
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Now, (1) and y nondecreasing implies that 


D(x) = 3 bn(i) (x(i) —x(0) 


=1 


od 


-1 


_ Ry ba(i) (x(i) — x(k)) + by bn(i) (x(i) — x(k)) 
> 0, Vx E R™, 


because each summand is nonnegative. This proves that 4(t)>0, teR. 
By Theorem 368 of Hardy, Littlewood and Polya (1952), 


D(x) = max 1 D(x), ¥V xe€ hk". 
Therefore, V t € RP, 
(*) A(t) = D(Z(t)) = max, 1 D_(Z(t)) 
n 
= max et PF ba(i)(X,, — ie Ore) 


This shows that s(t) is a maximal element of a finite number of 
continuous and convex functions, which itself is continuous and convex. The 
statement about a.e. differential being —nS(t) is obvious. This completes 
the proof of (a). 


(b) Without the loss of generality assume b > 4(0). Writea teR? as 
t = uO, wel, OeR?, ||6]]) = 1. Let dj = 0 Yj. The assumptions about % 
imply that not all {dj} are equal. Rewrite 
n n 
Alt) = Aud) = % bali) (X —ud)(i) = 2 ba(tiu)(Xi — udi) 


where now [iy is the rank of Xj;— ud; among {Xj — udj; 1 < j<n}. From 
(*) above, it follows that ¥(u9) is linear and convex in u, for every eR’, 


n 

||| = 1. Its a.e. derivative w.r.t. u is = dib,(tiu), which by Lemma 7.3b.1 
1= 

and because of the assumed continuity, is nondecreasing in u and eventually 


positive. Hence (ud) will eventually exceed b, for every OcR?, || @| = 1. 
Thus, there exists a u, such that s(u,f) > b. Since ¥ is continuous, 


there is an open set O, of unit vectors v, containing @ such that J(u) 
>b. Since b> (0), and fis convex, J(uv) > b, V ud>u,and V KO, 
Now, for each unit vector @, there is an open set O 9 covering it. Since the 


unit sphere is compact, a finite number of these sets covers it. Let m be the 
maximum of the corresponding finite set of u 9 Then for all u > m, for all 


unit vectors v, (uv) > b. This proves the claim (b) and also the lemma. o 
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Note: Lemma 7.3b.2 and tts proof is an adaptation of Theorems 1 and 
2 of Jaeckel (1972) to the present case. o 
From the above lemma it follows that if the r.v.’s Yo, X1, Xo, ..., Xn 
are continuous and the matrix n 3,(Yi-4 — Y)(¥i-,-— Y)’ is a.s. positive 
definite, then the rank of .& isa.s. p and the set {teR; s(t) < b} is as. 
bounded for every 0< b <o. Thus a minimizer Pp; of ¥ exists a.s. and has 


the property that makes |[5]| small. As is shown in Jaeckel (1972) in 
connection with the linear regression model, it will follow from the linearity 


result given in Theorem 7.3b.1 below that p; and Pp are asymptotically 


equivalent. Note that the score function y need not satisfy (1) in this 
theorem. 

Some steps of the proof of Theorem 7.3b.1 heavily depend on the 
representation of the AR(p) process {X;} in terms of the error variables 
Le}. For that reason we shall now extend the index i in the process {X;} 
to both sides of 0. Accordingly, assume that {e;, i = 0, +1, +2, ....} are 
lid. F r.v.’s and that 


(3) Xg= ppXa-rt+ poXint...+ ppXiptei, i= 0,41,42,...., p eR. 
In addition assume the following: 

(4) All roots of the equation 
p-2 


xP — px?! — pox 1+. — Pp =0 are in the interval (—1, 1). 


It is well known that if E| ¢|°< w, there exist constants {6;, j > 0} 
such that 0) = 1, Yj>0 | | < o, and that 


(5) Xi= es 6:4 €, 1=0, +41, 42,....,in Lo andas., 
»1 


where the unspecified lower limit on the index of summation is -—». See, 
e.g., Anderson (1971) and Brockwell and Davis (1987, pp 76-86). Thus {X;} 


is stationary, ergodic and EIYoll" < o. Hence (7.2.a1) implies (7.2.a3). 
Moreover, the stationarity of {Yi-;} and El Yoll’ <o imply that V 7 > 0, 


(6) P( max [Yi] 2 nn?) ¢ {m7}? Y Blin]? Tl ¥i-all 2 m/”) 


-2 2 1/2 
= 9? Ell Yoll71(|{Yoll 2 m’/*) = o(1). 


Thus (7.2.a2) holds. These observations will be used in the sequel 
frequently, some times without mentioning. 
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With this preliminary background, we now state 
Theorem 7.3b.1. (A.U.L. of R—statistics). Assume that (3) and (4) 


above hold. In addition, assume that F satisfies (F1), (F2) and that the 
following hold. 


(cl) (i) Ee =0. (ii) 0< Ee <o. 


(c2) y is nondecreasing and differentiable with its derivative 
being uniformly continuous on (0, 1]. 


Then, for every 0< B <a, 


(7) oI, In/7{5 (pen /2u) — } + w E QI] = 0p(1), 


where 5 := (51, ..., Sp) with 
: _7 0 em = 1 
=n (Xij-XMAF(a))-6], O= f olt)dt, 1<i<p, 


1=j +1 
Xj:=n' fb Xi, Q := fi dy(F), 
Y= ((A(k-j)), 1 ok <p; 1¢i <p; A(k) = Cov (Xo, Xx), 1<¢k <p. 
Before proceeding to prove the above result, we shall state a lemma 
giving the asymptotic continuity of certain basic r.w.e.p.’s. Accordingly, let 


h be a nonnegative measurable function from [0, 1] to R, U denote a 
uniform [0, 1] r.v., and define 


B(t):-= nV? ¥ Xs5 [MF(e)) P(e) < t)— A(t), OS tS, 1S ]¢P, 


t 
where H(t):= E A(U)I(U < t) = J h(s)ds, 0< <1. 
The proof of the following lemma will be given in the subsection 7.3d. 


Lemma 7.3b.3. In addition to (3), (4) and (cl1(i)) assume that Eh‘(U) < 
wo. Then, V y>0, V 1<jK<p, 


limlimsupn P( sup | S(t)-— S(v)| > 7) = 0. o 
N+ 0 |t-v| <q 
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Proof of Theorem 7.3b.1. Observe that with 
S(u):= n° % Xi5 o(Riu /(n+1)), we’, 1<j<p, 


1/2) @. _@. -1/2 
(8) ,,gup n°“ 15j(u) — 5i(u)| <P max, [Xx] [loll 2 “0, as.. 


Thus it suffices to prove the theorem with {5S;} replaced by {5;}. Let 
S’:= (Sj, ..., Sp). Observe that 


Gu) = od Yi-1 o(Riu /(n+1)), weR?. 


The proof is facilitated by centering S. Accordingly, define 


M(u) = n° 3 Yia[o(Rim /(n+1)) — 9], ue R?, 


a 


M:= 073 Yi [o(F(4)) - 9] 


As in the proof of Theorems 7.2.1, 7.2.2, let M(u), Fn(-, u), etc. stand 


for M(ptn !/ *u), F,(-, ptn *u), etc. Thus, e.g., Fn(-, 0) now stands for 
the empirical d.f. of ¢;, 1<i<n. Write F,(-) for Fy(-, 0). Recall, from the 


1/2.’ 


proof of Theorem 7.2.2 that ¢y = ej -n “"u Yi-, n ‘Rin = F,(€in, 0). 


Now, let 

eniu = n'/? [(Riu /(n+1)) — F(e)], 1<i¢n, wer? 
We first prove the 
(9) Claim: SUDiou notes = 0/(1). 


As in the proof of Theorem 7.2.2, the supremum w.r.t. i, u will be over 
1<i<n, MB), respectively, unless mentioned otherwise. 


To begin with, |{n(n + 1)° —1j| = O(n‘) implies that 
(10) sup |n/eniu —[Fa(€iu, u) — F(ei)]| = O(n"), as. 
lou 


Now, in view of (3), (4) and the discussion preceeding the stament of 
this theorem, it follows that {Xj;} are stationary, ergodic and hence by 
(cl1(i)) and the Ergodic Theorem, (1/n)¥; Yi. —*> EYo = 0. This together 
with (6) above, Remark 7.2.3 and (7.2.14*) imply that 
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n/?1 F(x, u) — Fa(x eit): 
el Rayep 2 (Fabs 8) — Fata) = on(2) 


This together with (6), (10) and (7.2.19) readily imply that 
n/e. iy = [Fa(eiu) — F(e:)] + Op(n 2?) 
(11) = [Fa(e:) — F(e)] —n /2u Yi f(€:) + Op(n 1”), 


where, Op(n / 2) is an array of processes in (i, u) that converge to zero, 


uniformly in (i, u), in probability, at a rate faster than n 1/2 


Now the Claim (9) follows from (*) the Glivenko — Cantelli Lemma 
and the assumption (F1) that ensures |[fl|_< o. 


Next, define 
T(u) := n di Yi-1€niu (F(e:)), uc RP. 
Note that 
Eel -1/2 = 
M(u) =n ~ ¥y Yi-s (V(F(ei) +0 Cniu) — ¢ |. 


Therefore, from the uniform continuity of y, the facts that n ‘Sy | Ya-] = 
O,(1) = n ‘Il; YiaYi -4|], which in turn follow from the assumption Ee< a 
and the Erogodic Theorem, and from (9), one readily concludes that, with 
Ui= F(€;), 

1/2 - 

|n/7[M(u) — M] — 7(u)| 
= [jn 75; Vif Uien ent) — (Us) — 2 eniw AUi)} 

(12) = )(1). 


Next, we approximate 7(u). Again, by the Ergodic Theorem, the 
independence of Y;-; from ¢€;, i> 1, and Ee =0 imply that 


n ‘3; Yi-1(U;) f(€;) — 0, a.s. 
Hence by (11), 
(13) 1(u) = a 75; Yil{Fu(es) — F(ei)} — 7a ¥i-af(es)] (Us) + Op(1) 


= Van —U Li + Op(1), 
where now 0),(1) is a sequence of stochastic processes converging to zero, 
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uniformly in uw, in probability, and where 
Va =n )/? 3. G4 [Fa(ex) — F(es)] (F(€:)), 
In =n) Dy Yus¥i- flex) O(F(es)). 
Note that 
EL, = E(n 3; YinYin)Q=2Q, Q= ffdy(F). 
By the Ergodic Theorem, 
(14) Ln, — ZQ, as.. 


Our next goal is to approximate Vy. To that effect, let Vnpj denote 
the jth component of Vy. Define 


nj(x) = n /? Ys Xi5 O(F(ex)) I(ex < x) 
Yoj(x) = 0 /?  Xi5 f OF(y)) dF(y) =n /? ¥; Xi5 YF(X)), 


a 
JG nj(X) = Wnj(x) — Vnj(x), x € R, 
= Bni(F (t)), O<t<1, 1<j¢p. 

Observe that 


Vai = f[Pa—F]d %pj = f [Fn —F] d.%j + f [Fn —F] dy 


1 
=~ f [%xi(Fa'(t)) - Haj(F“(t))] at — fr0j dP — FF 


re “ 
=— f LS CFPS()) — Bl) at — fro; alFa 
But, 2 isa process of Lemma 7.3b.1 with h = y. Hence 
(15) me | Vij + f vj d(F,-F)| 
~ -j ~ 
£ oR cts | 3 (F(Fo (t))) - 2(t)| = op(1), 


by Lemma 7.3b.1 and the fact that sup||F(Fa-(t)) — t|; 0<t<1] = o/(1), 
which in turn follows from Lemma 3.4.1. 
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mn _1 Nn 
Next, observe that, with Xj =n per 1<j<p, 
1= 


f vj (Fn —F) = 0? 8 Xi5 3s [O(F(4)) - 9] 
= Xjn/? 5; [ (F(a) — 9]. 


Let X:= (Xj, ..., Xp) and T=n //? 5; [y(F(e;)) —@]. Then from 
(13) — (15) we obtain 


(16) Va=-XT+0,(1), (uj) =-KXT-¥uQ+0,(1). 
From (12), (16) and direct algebra one now readily concludes that 
M(u) = M—X T-¥uQ+0,(1) 
=n? ¥ (Yin —X) [o(F(e)) —9] — 20 Q +5,(1). 


Now argue as for (5) to conclude that 
1/28 vy, ,_X) [ol Fle,)) -3) —S] = 
In 2 (Yi — X) [o(FCa)) — 9] — Sl = op), 


thereby completing the proof of (3). Oo 


Remark 7.3b.1. Note that the same proof shows that under the 
assumed conditions, for every 0 < B <a, 
-]j 2 4 
jouD, 1S(on a) — $(p) +a E ffae(F) I = op(1). : 
U ~_ 


Remark 7.3b.2. Argue either as in Section 3.4 or as in Jaeckal (1972) 
to conclude that |[n'/?(a, — p)|| = Op(1) and that |In/7(—, — p,)Il = op(1). 
Consequently by Theorem 7.3b.1, 


(17) n/p, — p) =0/7(p, — p) + 0,(1) = Q° E* S+0,(1). 


Observe that S is a vector of square integrable mean zero martingales 
with ESS = o% ¥, om := Var.(y(U)). Thus, by the routine Cramer—Wold 
device and by Lemma A.3 in the Appendix, one readily obtains 


< 2 
(18) S = N(0, Ty Y), 
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1/2; 1/2;~ _ n72_2 wl 
(19) i (p, — p) a N(0O, y), 0 (2, — p) a N(0, V),V= Q 71% - O 


Remark 7.3b.3. See the recent paper of Koul and Ossiander (1992) for 
an extension of the above results to any ye @ of (3.2.1). o 


7.3c. ESTIMATION OF Q(t): = ff dy(F). 


As is evident from (7.3b.19), the rank analysis of an AR(p) model via the 
above R-estimators will need a consistent estimator of the functional Q. In 
this subsection we give two classes of consistent estimators of this functional 
in the AR(p) model (7.3b.3), (7.3b.4). One class of estimators is obtained by 
replacing f and F in Q by a kernel density estimator and the empirical 
d.f. based on the estimated residuals, respectively. This is analogous to the 
class of estimators discussed in Theorem 4.5.3. The other class is an 
analogue of the class of estimators discussed in Theorem 4.5.1 in connection 
with the linear regression setup. 


Accordingly, let p be an estimator of p, K be a probability density 
in R, hy, bea sequence of positive numbers, h, — 0 and define, for x € R, 


€é4:= Xi— p’ Vie, 1¢61¢ 2; F,(x) =F a(x;'p) = n? Di I(€4 < x), 

- -1 — £: -1 — €: 

f,(x) := (nhy) © Oy K(-— f(x) := (nhn) © dy KA). 
Finally, let 

On — fifa dy(F 1). 

Theorem 7.3c.1. In addition to (7.3.b3), (7.3.b4), assume that Ee, = 0, 
Ee, < w. Moreover, assume that (F1), (F2) and the following conditions hold. 
(i) yve¢:={y: y a nondecreasing function on [0, 1], (0) = 0, y{1) = 1}. 
(ii) ba >0; hy—0, n/*h,—o. 
(iii) K is absolutely continuous with its a.e derivative K satisfying f |K|<o. 
. 1/2;~ 
(iv) |jn'/?(@ — pl] = Op(1). 
Then, 


(1) sup |Qn — Q(f)| = op(1). 
YE 6 
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Proof. The proof is similar to that of Theorem 4.5.3, so we shall be 
brief, indicating only one major difference. Unlike in the linear regression 
setup, i.e., unlike (4.5.11), here we have from Remark 7.2.3, 


(2) sup, n/?|#,(x) — Fa(x)| = op(1), where F(x) = Fa(x, p). 


In other words the linearity term involving ni/ *(p —p) is not present in the 


approximation of F,. Proceeding as in the proof of Theorem 4.5.3, (2) will 
yield 


fn — fall, § (n/7hn) + fn/?[F, — Fall = f 1K 
= 0p((n’/7hn) *) = 09(1). 


Compare this with (4.5.19) where Op((n'/ *n,) +) appears instead of 
op((n *n,) +). Rest of the proof is exactly the same as there with the 
proviso that one uses (2) instead of (4.5.11), whenever needed. o 

The reader may wish to modify the above proof to see that Qh 
continues to be consistent for Q even when Ee # 0, so that the term that is 
linear in n?/ *(p —p) is now present in the expansion of Fp. 


We shall now describe an analogue of Qf of (4.5.6). The motivation is 
the same as in Section 4.5, so we shall be brief on that also. Accordingly, let 


B(y) = f [Pa(yex) — Fa(-yex)] d O(Fa(x)), 20. 


Observe that p is an estimator of the d.f. of the absolute difference | « — 7], 
where € and 7 are independent r.v.’s with respective d.f.’s F and g(F). 
As in Section 4.5, one can use the following representation for the 
computational purposes. 


Bly) = 2 * 3 [yi/n) — A(H-)/a)] Bley Eyl Sy) ¥2O, 


where {€, 4) } are the ordered residuals {€;} from the smallest to the largest. 
Now let tn denote an ath percentile of the d.f. p(y) and define 


Oe = nl/? p(n 1? tn) /2tn, O<a<l. 


The consistency of these estimators may be proved using the method of the 
proof of Theorem 4.5.1 and the results given in Corollary 7.2.1. The 
discussion about the choice of a etc. that appears in Remark 4.5.1 is also 
pertinent here. 
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Another class of estimators is obtained by modifying Q, by replacing 
F, by the estimator F(x) = f fa(y) I(-w<y<x) dy. The consistency of 


these estimators can be also proved by the help of Corollary 7.2.1. oO 
7.3d. PROOF OF LEMMA 7.3b.3. 


The proof of Lemma 7.3b.3 is similar to that of Theorem 2.2a.1(i) and will be 
a consequence of the following two lemmas. 


Lemma 7.3d.1. In addition to (7.3b.3), (7.3b.4) and (7.3b.c1) assume 
that the following hold: 


(dl) Thed.f. F ts continuous and strictly increasing. 


(d2) The function h on (0, 1] to R is nonnegative and f | h(t) |“at< w. 


Then the following hold: 
(A) Forany 0<u<v<w<l and forall 1<j<p, 
(1) lim supn E{ S(v) — 3(u)}"{( B(w) — B(v)}’ < C mm,, 
where m, := im h’(t) dt, m2 := h’(t) dt, C is a constant given in (19) 
below. 
(B) Forany 0<u<v<1, and for 1<j<p, 
lim sup, E { S(v) — G(u)}* < C mi. 


Proof. (A). Since u, v, w, are fixed, we shall suppress these entities 
in the notation. Let Fy, := o—field{e;; i< k}, k= 0,41, 42, .... Further, to 


simplify writing let x =F 4(u), y =F ‘(v) and =F (w) and define 
(2) pr=A(v)—Hlu), p2o= Hw)— Hv); ao=1—-pj, j= 1,2, 
ay := MF(ei) I(x << y)—p1, Bi := A(F(ei))I(y < €1 < 2) — po. 
Then 
(3) { B(v) — S(w)}"{ Bw) — Bw)’ = wi Xs jan)” (Lr Xr 58s)” 
In carrying out the computations that follow we have repeatedly used 


the following facts: aj, 6; are centered; aif; are Fy-; measurable for all 
i<k and Xj; is 7;-; measurable and independent of ¢€;,i>1. Thus, 
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(4) E aij =0=E fy, for all i,k. 
EX4jXx-j iP = E[Xi-jXx-jOiE (Px |Fx-1)] = E[Xi-jXu-jai] E(Gk) = 0, i<k; 


EX; 5XxjX2ja% m, f= E[X4-j;X%jX? jf aE ( ax |Fx-1)] = 0, i,;r<k. 


Using facts like these one can write 


(5) E(d; Xi-j ai)"(Zr Xrjhr)” 
= DEX{y af Ai + DDEXi 5 Xtyai fr 
+40 DEXTGXL; 05 Bi ox A 
+ 2 Hdd E Xi4 Xk X75 (a4 on Br + Bi Br ar) 
2 
+40 20 E Xig Xrj Xk-j Oi Br On Pr 
2 
+40 0 2 E Xij Xrj Xk o1 Br On Pr 
=7T,+T.+4T3+ 2(T4 + Ts) + 4(T¢ + Ty), Say. 


We shall now show that n °T; — 0, for j= 1,4, 5,6, 7, and that 


lim sup n *(T. + 4T3)< C mmo. The basic idea of the proof is to exploit 
the hierarchal nature of the process. Observe that had the underlying 
observations been independent then T; would have been equal to zero for j 
= 4, 5, 6, 7. However, under (7.3b.3), {Xj} are not independent but 
asymptotically behave like independent r.v.’s. This is the reason to expect 


nT; 40 for j= 4,5, 6, 7. 


The details of the proof of n°; tending to zero for each j = 4, 5, 6, 
7 are elementary and cumbersome but similar. So the details will be given 


only for nT, — 0. To this effect, observe that 

E(XiXrjXkjoiSronSk) = E XijXrjXkjaiBrE(on Ak|Fe1), i,t <k. 
Moreover, {¢;} i. i. d. implies that for all k > 1, 

E( ax | Fx-1) = (1—p1)pi(—p2)+(—p1)(1-p2)p2+p ip 2o(1—p1-p2) = —pips, 


and, in addition, Ee = 0 implies that EX;4X;jX¢jai =0, r <i, k-j ¢i-. 
Therefore, 
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(6) n°“T;= —a™ p pal oe EXj-; Xr-j X3 a Br 


Q 
FED B EXiG Xry Xk y a1 Ar} 


= — pip2 n {Tr + T 79}, Say. 
Now, for a convenient reference we rewrite (7.3b.5) as 
(7) X;= mT bik €k; i>, as., 


where, as in (7.3b.5), the unspecified lower limit on the index of summation 
is —w, and {4} are real numbers satisfying @) = 1, A; < o, with 


q 
Aq := Dero | | » gil. 
Note that sup, | %&| < A; and hence A; <m implies that 


(8) A,<Ai<o,  forallq>1. 
Next, define 
(9) Agen := Ce On €r, 
n n 
Hack = Xu Om-r €ry O<n<m<oa, k <n. 
re: 
ar:=E(ae'), br:=E(Pe), pr=Ee, 1<r¢4, 


of = Var.(a), of:= Var.() 


where a, § are copies of a;, J1. Observe that 


(10) Hn yk = Anon = An,k-1; k ¢n¢ m, 
i-t 

Xi = Aigi = Aivig + Hi,ije+ 64 = Aisin + 64, V i, 

of < mx, where my, is as in (1), k = 1, 2. 


Morover, {¢;}i.i.d., E(e) = 0, and (8) imply that for all n < m <a, 
(11) EA2,, = » OF ur < pn Ai <o, 
r7m-n 


EAgon = aie + 38 on ee 4, Aor < (4+ 2) A4 < o. 
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For the same reasons, from (10) it follows that 
E{X? oe; | 4-1} = 2 Aiyi-t a1 + a9, for all i. 
Use this and argue as for (6) to obtain 


Tr = Dy E Xyjfr {2a1 Ly + a2 Xz} 
+ Liao E X;-jBr {2a;L; + aoXi-j} 


= T71+T712, say, where Ly := Xj Ai,i-t1. 


The C-S inequality, the staionarity of the process {X;} and (11) imply 
that for all r, j, 


E| Xy-Ar Lr4j| < {E(X156r)" EL, }/2 < Ds C4 < o, 
E|X;-j6; Ly] < D, C4 < Q, 


where D,, Cy are constants depending on the kth moment of jute and the 
kth moment of «€ and Ax, respectively, 1<k< 4. These facts imply that 


(12) n7|Tr] = O(n *) = 0(1). 


Next, to handle T 71, use (11) to obtain that for i—j > r+l, 


Ly = {Aj-jyr-1 + Oj-j-r €r + EEE AEP OOo + Oy €r + Hist. 
Use the above type of conditioning argument to obtain that 

EX, By Ly = EXy5 {[Oj-+ Aijor-1+ 0:31 Aisr-1] D1 + Oi-j-1 Oi+ bo}, 

EX, Br Xi-g = EXrj{ Oj be + 20:52 Aigor-t bi}, j-j2rt+1. 
Use these facts together with (11) and an argument like the one that led to 
(12) to conclude that n °|Tri2| = O(n *). This and (12) yield 
(13) n*|T7| = O(n -) = o(1). 

Now we turn to T72. Using (10) write 

Xx-j = Ax-joiat Aji €i+ He as k—j > i+l, 


and use arguments like those above to obtain that 


E{X¢.,; ai|Fi-1} = Oe 54 ao + 2Ax-j5i-1 Ak -j-i a1, k—j > i+], 
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so that 
(14) T= Be EXrj Pr Xi {0-5-1 82+ 2Ax-ji-1 Ok j-i ay} 
= a2 T 721 + 2a1 T 729, say. 


Arguing as above and using the stationarity and the fact that EX») = 0, one 
obtains 


EX;.j Br Xi-j = EX;.j Br {Ai-j,r-1 + 0; -; -r €r + Hijrei} 


= EX,.j 6ij+b;=0, ij > rel. 
Thus, 
|Trai| < a2 de a | EX, 6 Xig| = 0. 
Similar arguments show that ln ?T799| = o(1) thereby completing the 


proof of n“|T79| = 0(1). This together with (13) shows that 


n ?|T7| = 0(1). 
Now consider T2: Rewrite 
2 2 2 
T2= >» + ») (EX4-; Xr-j ai 62) = To1+To2, say. 
Again, by a conditionning argument, 


(15) Ta = 02 + DD EXi4 Xr4 ai 


= 03° ( EX}; X rj ai) 


> 
i<r5r-i§j ee are 
2 
= 02 + {To1+ Toi}, say. 


Again, the C—S inequality, the stationarity of the naar {X;}, the 


assumptions (7.3b.cl) and (d2) imply that 0 < Toy; ¢ j-n- EX9 = O(n), by 
(8) and (10), so that 


(16) nT o41 = O(n +) = (1). 
Next, argue as for (14) to obtain 


2 2 2 
Too = 2 Y BEX Xr-i Qi 
a ey Oe 


2 9 rj 42 
EX45 a4 {Arj,i-t + Or-j-4 €i + Hr5, iss} 


D> 
i<r3r-i2j +1 
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2 
—— 01° 


2 2 r -j 9 
B EXG5{Atjia+ YO jm Ee 
i<rjr-i>j+ ac { r-jsi-l* sey ee } 


2 
- eA +1 EX} Arjria Oj 


(17) = 07 - Bj +2c-By+O(n), say, c = E(ea)’. 
The C-—S inequality and (11) yield that 
(18) n °B,< C <o, forall n> 1, 


where C is aconstant depending on p,q and A,. A similar argument shows 
that |B2| = O(n’). This together with (17) and (18) yield that 


(19) lim supp iad ET <C a2. 
Hence, from (17) — (19) one readily obtains 


lim supa n”|Ta| < C of of <C mumz, __ by (10). 


Similarly, one concludes a similar result for T 22 thereby enabling one 
to conclude 


(20) lim supp n ?|To| < C mmo, where C is as in (18). 


Finally, consider n “T;: By arguments similar to those above we 
obtain 


(21) n °T; = —n “ys » = ts ai Gi Dips 


=—pipon EX}; a; 6; X75 + O(n’). 


“eee; +1 
Let cr = E(a@ ec’). Use (10) and procced as before to obtain 
2 Dn iy? 2. rd 6. 
EX}{-; Qj Bi Xr-j = EX4-;{—pip2 (Ar-ji-1 + [2 a Y -j -M ) 
+ Op 5-402 + 21 Arjsi-t 5-1}. 


Combine this with (21), argue as above using (11) and the C—S inequality, to 
obtain 


n °T; = = (pipe) n E X4.[A2 r-j»i-1 + Mn 3 ia aT O(n '), 


- y -idj+ 


Another application of (11) yields 
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lim supp |n “T3| < C (pyp2)? < C mi m2, 


where C is as in (18) above, because 
2 sv 2 ge a fh 72 
pi=1f Mt) dt}? < (v—u) f A(t) dt < my 
p? = {f" h(t) dt }? < (w —v) fi" W(t) dt < mp, 
The proof of (A) is now terminated. 


PROOF of (B). Fix j and define, for r>1,k>1, 


Urk = E{(Xkj ox)" |Fu-1} = Xk-j E(ok|Fu-1); 
k k 
Un, = 2 uyi; Sk = L Xij ai. 
1=1 1=1 
Now observe that 
(22) (v) — G(u) =n “Sp. 
Because {Xj-; aj} are conditionally centered, gives ¥;-,, it readily follows 
that {Sn, Fa} is a mean zero martingale. Therefore, from Chow, Robbin 
and Teicher (1964), 
n 
(23) E Sq = E{Usn + 4 Sp Usn + 6 Se Urn — 6 2 u2j Uri}. 
But, 


n Bs aspect S E(X2 a2 
EY uk Ua = BLY E(Xi-j ol Fit): 2 E(Xi5 ai] Fi} 


1 
ni2 2 &.2 2 2 w2 4 
=F Xk-j 1° UX ti = oR. E Xj Xk-j ° 71, 
n n 3 
E SnUan = E{ © Xig aie UD E((Xk-j Ox)" |Fe-1)} 
= EX Xe a;-E(a°), 


E $a Um = 04 + E{(E Xij a + 2 BD XijXrjaian)( B Xe4)} 


2 9 9,2 , 2 Deed) 
BENGE Sg eee 8h oe ed 


2 
+20 02 EXiy Xrj Xj 4 orl, 
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E Um = DE X44 + E(a’). 
1 


Combine the above with (23) to obtain 
nE Sf = n7{ 3 EX{, + E(a‘) + 193 EX;;X25 a; - E(a’) 
+ 60% - [3,3 EX} jq3Xk5 + o1- 3, Fas EX? X?.; 
+20 D EEX; Xr X25 ay a)}. 
=n “{K,+4K2+6 0% [Ks + 0% Kg + 2 Kg}, say. 
Arguing as for the proof of (A), one can show that n °K; —0,j=1, 


2,5, that lim sup n “|K3|< C o%, and that lim sup n °K, <C. Hence (B).o 


Lemma 7.3d.2. In addition to (7.3b.3) and (7.3b.4) assume that Ee = 0, 
Ee < o, and Eh?(U) <wo. Then the finite dimensional distribution of &Z, 


for every 1<j< p, converges weakly to that of {E(X)"}/ ap -), where B is 
the Brownian motion in €[0, 1| with the covariance function H(u) —H(u)H(v), 
O<uc<vi¢l. 


Proof. The proof uses Corollory 3.1 of Hall and Heyde oe p 58)(see 
Lemma A.3 in the Appendix) and the Cramer—Wold device. Accordingly, fix 


j and let 0<uj< uo<.... < up <1, OER’. Define 
r 
On(ei) = Ax {a(F(e))I(F (es) ¢ ux) — G(ux)}, 
-1/2 oe te : 
fa = 11 Xi-j On(€4), A eons 1<i¢n. 
r 
Note that Spn = week Z(ux). Because of the given assumptions, and 


because &n; is conditionally centered, given F;-1, {(Sni, Fi-1), 1<i<n} isa 
mean zero square integrable martingale array. Next, fora 9> 0, by the C-S 
inequality, 
nh 
© Eléni I(| nil > 8)| Fi] 
_71 Nn 
=n * DEX, {Elan(e)I(|Xij an(ei)| > On!) | Fi]} 


“fs n 
cad EX3; PM%([Xi5 aa(es)| > On /?|F4) +» Dass 
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_7 n 
< (n°?) 2 B|Xi915EY/7U| on(€4)|Fial-Dave 


€ C3D4,7(6n!/?)? = 0(1) 
where, in the above, D4,,; is a constant dependin on r, D4 and @ and C3 isa 
constant dependin on p3 and d 
Next, from the definition of H in terms of h one readily sees that 


n = n 
2 E(éni|Fia) =n BD X35 Elan(ei) |Fi-1] 
-1 n y) T r 
=n 2, Xi 7H 27 AxAm [G(uxAUm) — G(ux) G(un)] 
= E(X0), B,D, ArAn[G(uxAte) — G(ux)G(ua)] + o(1), 


by the Ergodic Theorem. 

The above calculations show that {Syi, Fi4, 1 <i <n} satisfy the 
conditions of Lemma A.3 and hence Snn converges weakly to an appropriate 
normal r.v. This completes the proof of the Lemma. o 


Proof of Lemma 7.3b.3. In view of the Lemmas 7.3d.1(A) and 7.3d.2 
above, the proof uses Lemmas A.1, A.2 and Theorem A.1 in the npReees 
and is “exactly like that of Theorem 2.2a. 1(i). 

7.4. M.D. ESTIMATION 
In this section we shall discuss two classes of m.d. estimators. They are the 
analogues of the classes of estimators defined in the linear regression setup at 


5.2.11) and (5.2.20). To be precise, consider the autoregression model 
7.1.1) and define, for a GeDZ(R), 


(1) Kg(t)= 3) ffm? E (Xs) {UKs ¢ xet Yi) — FO)? dG(), 
Kat) = 3, fn? EX) ¢ xt Yi.) 
—I(-X; < x-t Y;)}]* dG(x),  teR?, 


In the case the error d.f. F is known, define a class of m.d. estimators of 
p to be 


(3) pz := argmin{K,(t); t € R°}. 


In the case the error distribution is unknown but symmetric around 0, 
define a class of m.d. estimators of p to be 
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(4) pz = argmin{K,(t); t € RP}. 


Note that the role played by the vectors {nx / *1e(X;-1), g(Xi-2), ..., g(Xi-p)]; 
1<i<n} is similar to that of the vectors {dyi; 1 <i< n} of Chapter 5. To 


put it in matrices, the precise analogue of D is the matrix n 1/2 Y, where 
Y is as in (7.3a.1). 

The existence of these estimators has been discussed in Dhar (1991a) 
for p = 1 and in Dhar (1991c) for p > 1. For p = 1, these results are 
relatively easy to state and prove. We give an existence result for the 
estimator defined at (4) in the case p = 1. 


Lemma 7.4.1. In addition to (7.1.1) with p = 1, assume that either 
(5a) xg(x)>0, V xeR, or (5b) xg(x) <0, V xeR, 
Then, a minimizer of Ky exists if either G(R) = » or G(R) < » and g(0) = 0. 
The proof of this lemma is precisely similar to that of Lemma 5.3.1. 
The discussion about the computation of their analogues that appears 
in Section 5.3 is also relevant here with appropriate modifications. Thus, for 
example, if G is continuous and symmetric around 0, i.e., satisfies (5.3.10), 
then, analogous to (5.3.7*), 
p n n 7 v4 
Kg(t) = 2B 2 (Xis)e(Xei){|GXirt Yin) — G(-Xiet Yi-a)| 
— |G(Xi-t Yi.) — G(X,-t Yy-1)| }. 


If G is degenerate at 0 then one obtains, assuming the continuity of the 
errors, that 


n 7 
(6) Kg(t) = B1E g(Xia)sign(Xi - t Yia)], wep.l. 


t's 


One has similar expressions for a general G. See (5.3.7) and (5.3.77). 


If g(x) = x = G(x), pg is mle of p if F is logistic, while pz is an 
analogue of the Hodges—Lehmann estimator. Similarly, if g(x) = x and G is 


degenerate at 0 then pz is the l.a.d. estimator. 


We shall now focus on proving their asymptotic normality. The 
approach is the same as that of Sections 5.4 and 5.5, i.e., we shall prove that 
these dispersions satisfy (5.4.41) — (5.4.A5) by using the techniques that are 
similar to those used in Section 5.5. Only the tools are somewhat different 
because of the dependence structure. 
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To begin with we state the additional assumptions needed under which 
an asymptotic uniform quadraticity result for a general dispersion of the 
above type holds. Because here the weights are random, we have to be 
somewhat careful if we do not wish to impose more than necessary moment 
conditions on the underlying entities. For the same reason, unlike the linear 
regression setup where the asymptotic uniform quadraticity of the underlying 
dispersions was obtained in L,, we shall obtain these results in probability 
only. This is also reflected in the formulation of the following assumptions. 


(7) (a) Eh*(Yo) < o. (b) 0< Ee? < o. 
(8) V |lull<B,aeR, 
f Eh?(Yo)|F(xen” /?(w’ Yoeall Yoll)) - F(x) |dG(x) = o(1). 


(9) There exists a constant0<k<o,3 V 6>0,V |lull < B, 


_-1, ~ = 
lim infa P( f'n *[ Bh*(¥i-){F(xen Marys en ayy ll)- 
~ F(xen a Yn ays lf) }]’dG(x) < 62) = 1, 
where h* is asin the proof of Theorem 7.2.1. 


(10) For every |lul] < B, 


f a [3 b(Yia){F(xen 2a Y54)-F(x)-n 2/7 a’ ¥j-if(x)}]?dG(x) = 0,(1), 
1= 
and (5.5.68b) holds. 
Now, recall the definitions of Wh, m, *, W*, aT. Ww", Zt m* from 
(7.1.6), (7.2.2), (7.2.5) and (7.2.6). Let |-|, denote the Ly-norm w.r.t. the 


measure G. In the proofs below, we have adopted the notation and 
conventions used in the proof of Theorem 7.2.1. Thus, e.g., & = Yi-1; W(-), 


Vy(-) stand for H(-, pan /? 


Lemma 7.4.2. Suppose that the autoregression model (7.3b.3) and 
(7.3b.4) holds. Then the following hold. 


u), v(-, pon /7u), ete. 


(11) Assumption (8) implies thatV 0<B<a, 
E f [2° (x; u, a)-Z"(x; u, 0)J"dG(x) = oft), V lull < B,aeR. 


(12) Assumption (9) implies that V 0<B<ao,V |lul| < B, 
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lim inf, P( sup nl/ | (x, pen if *y) — v(x, pen i 2a) in <k62) = 1. 
6 


v-ujl< 
where k and 6 are as in (9). 
(13) Assumptions (7), (9) and (10) imply that V 0 <B <a, 


-1/2 


sup jaca J In! {on(x, pen 7a) - vu(x, 0)} 


_4n 
-u'n * Eh(¥i-1) Yi f(x)]°dG(x) = 0)(1). 
Proof. Let, for x,a¢€R; u, y € R?, 
(14) p(x,u,a;y) := |F(xen //?(a’ysallyl|))-F(xen //2uy)|. 


Now, observe that nl/ 210" (x; U, a)-Z° (x; u, 0)] is asumofn r.v.’s 
whose ith summand is conditionally centered, given F;-;, and whose 


conditional variance, given F;-1, is E[{h™(&)}? p(x,u,a; €;){1— p(x,u,a;&;)}], 1 


<i<n. Hence, by Fubini, the stationarity of {&} and the fact that (h*)*« 
h2,V ucM(B), 


Ih.s.(11) < f Eh°(Yo)p(x,u,a;¥o) dG(x) = 0(1), 


by (8) applied with the given a and with a = 0 and the triangle inequality. 


To prove (12), use the nonnegativity of h*, the monotonicity of F and 
(7.2.10), to obtain that ||v|| < B, ||v - ul] < 6 imply that V |[ull < B, 


(15) n!? | A(x) — va(x)| ¢ |m*(x; u, 6) - m*(x; u,-8)|,  V xeR. 
This and (9) readily imply (12) as the r.v. in the l.h.s. of (9) is precisely the 
| - i. of the r.h.s. of (15) for each n > 1. 


The proof of (13) is obtained from (7), (9) and (10) in the same way as 
that of (5.5.30) from (5.5.7), (5.5.8) and (5.5.9), hence no details are given. o 


Lemma 7.4.3. Suppose that the autoregression model (7.3b.3) and 
(7.3b.4) holds. In addition, assume that (8) and (9) hold. 
Then, V0 < B <a, 
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(16) supp icy SM (x, pen 7a) — H6*(x, p)]’AG(x) = 091). 
(17) supp acy f [Mls pom!) — W(x, p)]’AG(x) = of (1). 


Proof. Let q(x,u;y) := |F(x + ni/ *a’y) - F(x)|, xe€ Ru, ye R. 


The rv. n/ Al W%(-)- W(-)| isa sum of n r.v.’s whose ith summand is 
conditionally centered, given 7;-;, and whose conditional variance, given F;-, 


is E[{h(€:)}? q(-,u;€:){1— a(-,u;€)}], 1< i <n. Hence, by Fubini, the 
stationarity of {&|} and the fact that (h*)? ¢h2, V [lull < B, 


(18) El Hm -w#*|2< f a YEW(Y;4) \F(xen !/*a-Y;.4) - F(x)| dG(x) 
¢ f Eh?(¥o)|F(xen uo) - F(x)| dG(x). 

Therefore, by (8) with a= 0 and the Markov inequality, 

(19) | %e- ¥" |, = 0p(1), ¥ lull < B. 


Thus, to prove (16), because of the compactness of M(B), it suffices to 
show that forevery > 0 thereisa §>0 such that for every ||ul| < B, 


(20) lim inf, rt * [Ly —Lul < 7) = 1, 
6 


v-uli< 


where Ly:= | %-H*|%, |lull < B. 


Expand the quadratic, apply the C—S inequality to the cross product 
terms, to obtain 


(21) lLu-Le| | K-H% [e+ 2K -K | |W -W' |, 


Observe that h’ > 0, F nondecreasing and (7.2.10) imply that 
+ + zt + 
0< |m (x; u, +6) ~ m (x; u, 0)| ¢m (x; u, 6) ~ m (x; u,-6), 


for all xeR, s € M(B), ||s-ul| < 6& Use this, the second ineqality in (7.2.9), 
(7.2.10), (7.2.11), and the fact that (a+b)? ¢ 2(a%+b?), a, bel, to obtain 


248 AUTOREGRESSION TA 


| Ke - He |? < 16{ f [Z"(x; u, 6) - Z*(x; u, 0)|’dG(x) 
+ f [Z*(x; u, -6) - 2° (x; u, 0)]°dG(x) 
+ f [m*(x; u, 6) - m°(x; u,-8]"dG(x) 
+ [n/?(v% - va) |? Fs 
for all v € MB), ||v - ul] < 6. This together with (9), (12), (13), (19), (21) 


and the C—S inequality proves (20) and hence, (16). 
The proof of (17) follows from (16) and the first inequality in (7.2.9). o 


Now define, for t € R’, 
(22) Ku(t):= f in 7?  W(Wis){U(Xi < +b” Via) — F(y)}P dG(x), 
Kx(t) = [[#(x, 0) +0'/(t-p)’ a? ¥a(¥ia)Yisl(x))? dG(x). 


Theorem 7.4.1. Suppose that the autoregression model (7.3b.3) and 
(7.3b.4) holds and that (5.5.69), (7) — (10) hold. Then, VW 0<B <a, 


(23) sup yigg |Kn(o+ 1/70) — Kiy(p + /*u))| = op(1). 
Proof. Observe that, by (5.5.69), (7), 
(24) Ef %°(x, p) dG(x) = Eh*(Yo) f F(1-F)dG <o. 


The rest of the proof of (23) follows from Lemmas 7.4.2 and 7.4.3 in a similar 
way as that of (5.5.28) from Lemmas 5.5.1, 5.5.2 and the result (5.5.30). o 


Now we shall apply this result to obtain the required quadraticity of 


the dispersion Kg and Kg. For that purpose recall the matrices 2, ¢ and 
B, from (7.3a.1). Note that Xj-j, g(Xij) are the (i,j)t entries of #, % 
respectively, 1<i<n, 1<j<p. Also observe that the 


n / 
(25) jth row of By is & e(Xi-g)¥i 4, 1L<¢j<p. 
i= 


To obtain the desired result about Kg, we need to apply the above 
theorem p times, jth time with 


(26) h(Y¥i-1) = g(Xi-j), j=, -.., p. 


7.4 M.D. ESTIMATION 249 


Now write #% for # when h is as in (26) and %(-) for H%(-, p), 
1<j<p. Note that it) Cs 2) 


W(x) = 0? ¥ e(Xis){Meisx)—F(X)}, 1Si¢p, xER. 


We also need to define the approximating quadratic forms: For t € R’, let 
‘ p _4Nn 
(27) Kg(t):= Bf [H(x) +0(t-p)’ wt Y g(Xs5)¥i-d(x)PdG(), 


(28) KSlt) = Bf LH (a) + an" (b-p)? 0B 6X) Yi f(a) PaG(), 


where 
(29) "(x)= PY (Xs sMM(asx)-M-a<x)}, 18] 6 P,xER 


Before stating the desired results consider the conditions (7) — (10) 
when h is as in (26). Condition (7a) is now equal to requiring that 
Eg?(X1-;) < o for all j=1,..., p. Because of the stationarity of {X;}, this in 
turn is equal to 
(Tag) Eg?(Xo) < oO. 

Similarly, (8) is equal to 
(8g) V |u|) <B,aeR,1<¢ j<p, 
2 -1/2 
fBg?(X1j)| F(x “/?(u Yorall Yoll)) - F(x)|dG(x) = 0(1). 
Let (9g) stand for the condition (9) after h"(Yi-1) is replaced by g°(Xi-;), 
1<j<p, in (9), 1<i<n. Interpret (10,) similarly. We are now ready to state 
Theorem 7.4.2. Suppose that the autoregression model eae and 


(7.3b.4) holds and that (5.5.68a), (5.5.69), (7b), (7ag) — (10,) hold. 
Then, V 0<B<a, 


(30) SUP I all<B |K,(p + n 1/29) —K,(p + n 1/2) = 0p(1). D 


Proof. Note that the jth summand in K, isa Ky with h as in (26). 
Hence (30) readily follows from (23). o 


Lemmas 7.4.2 and 7.4.3 can be directly used to obtain the following 
Theorem 7.4.3. In addition to the assumptions of Theorem 7.4.2, except 


5.5.69), assume that F is symmetric around 0, G satisfies (5.3.8) and that 
5.6a.13) holds. 
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Then,V 0<B<a, 
-1/2 - -1/2 
(31) SUP I al|<B | Ke(o+n / u) _— K-(p+n / u))| = Op(1). O 


Upon expanding the quadratic and using an appropriate analogue of 
(24) obtained when h is as in (26), one can rewrite 


Kg(t) = Kg(p) + 2(t-p)’n /7B, f° W(x) A(x)dG(x) 
+ (t-p)’ 0 "BnBn (t-p)|f]g, teR?, 


where W:= i) --- Mp)’. Now consider the r.v.’s in the second term. 
Recalling the definition of ~ from (5.6a.2), one can rewrite 


Sn := f W(x) {(x)dG(x) = 0? ¥ gs (Wei) - EW(e)] 


where g;’ is the ith row of % i.€., 


(32) gi’ := (g(Xi-z), g(Xi-z), .., g(Xi-p)), 1 ¢i¢2. 


Since g; is a function of Yj-;, it is #;-; — measurable. Therefore, in 
view of (Tag and (5.5.68a), {(Sn, Fn-1), n > 1} is a mean zero square 
integrable martingale array. The same assumptions, and an argument like 
that in the proof of Lemma 7.3d.2, enable one to verify the applicability of 
Lemma A.3 in the Appendix to S,. Hence, it follows that 


(33) Sn — N(0, G¥r’Ipxp),  G* = Egugi’, 7? = Var W«1)/( f fac)’. 
By the stationarity and the Ergodic Theorem, we also obtain 
(34) on /B, — B, as., B:= En 1B, = EgrY. 


Consequently it follows that the dispersion Kg satisfied (5.4.A1) to 
(5.4.43) with 0) = p, & 20/7, S, =n 7B, Sa, Wain 'BoBp, W = B, 
Y= B G*B r?, and hence it is an u.l.a.n.q. dispersion. 


In view of (24) applied to h as in (26), the condition (5.4.A4) is 
trivially implied by (7a,) and (5.5.69). 

Recall, from Section 5.5, that in the linear regression setup the 
condition (5.4.45) was shown to be implied by (5.5.11) and (5.5.12). In the 
present situation, the role of T,, T, of (5.5.11) is being played by n “By f, 


n |B, — respectively. Thus, in view of (34) and (5.5.68a), an analogue of 
(5.5.11) would hold in the present case if we additionally assumed that B is 
positive definite. An exact analogue of (5.5.12) in the present case is 
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(35) Either 
0 gi¥i1020, V 1<i¢n,V OER, [4] =1, as., 
or 


6 g:5¥i40<0, V 1<i<n,V OER, lO] =1, as. 
We are now ready to state the following 


Theorem 7.4.4. In addition to the assumptions of Theorem 7.4.2, 
assume that the B of (34) ts positive definite and that (35) holds. Then, 


(36) n!?(g—p)=—{n Bn fPdG}' Sq + op(1). 
Consequently, 
(37) n/*(ag— p) — N(0, (B) “G*(B ) *72). : 


Let px denote the estimator pg, when g(x) =x. Observe that in this 
case Gt = B= En! % = EY Yo. Moreover, the assumption (35) is a 
priori satisfied and (7.3b.3), (7.3b.4) and (7b) imply that EY Yo is positive 
definite. Consequently, we have obtained 


Corollary 7.4.1. Suppose that the autoregression model (7.3b.3) and 
erage holds and that (5.5.68), (5.5.69), (7b), (8g) — (10g) with g(x) =x 
old. Then, 


(38) n'/2(5, — p) — NO, (EY oY ) ‘7?). 0 


Remark 7.4.1. Asymptotic Optimality of px. Because B and EY :Y 
are positive definite, and because of nS — EY:Yo, a.s., and (34), there 
exists an No such that n +. #andn /B, are positive definite for all n>No. 


Recall the inequality (5.62.8). Take J = n i/? 7,L= ni? g- in 
that inequality to obtain 


n'9Yr2n ¥ s(n FS S) nS FY, Vn2No, as, 


with equality holding if, and only if @« Y. Letting n tend to infinity in 
this inequality yields 


(B) 1G* (B )* > (EYoYo) -. 
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We thus have proved the following: 


(39) Among all estimators {pp; g satifying (Taz) —(10,) for the given (F, G) 
that satisfy (7b), (5.5.68), (5.5.69)}, the one that minimizes’ the 


asymptotic variance is px ! O 


We shall now state analogous results for p,. Arguments for their 
proofs are similar to those appearing above and, hence, will not be given. 


Theorem 7.4.5. In addition to the assumptions of Theorem 7.4.4, except 
5.5.69), assume that F is symmetric around 0, G satisfies (5.3.8) and that 
5.6a.13) holds. Then, 

(40) n/?(ps — p) =—{n By f PAG} S,* + op(1), 
where 
Sa = f W(x) H(x)dG(x) = 2? Y gs [-cs) — Kad] 


Consequently, 


(41) n/*(pg — p) — N(0, (B) “G*(B ) *r?), 


(42) n'/?( pt — p) — NO, (EY Yo) /72). 0 
Obviously the optimality property like (39) holds here also. 


Remark 7.4.2. On assumptions for the asymptotic normality of px, px- 
If G is a finite measure and F has uniformly continuous density then it is 
not hard to see that (8,) — (10g), with g(x) =x, are all implied by (7b). 
Consider the following assumptions for general G: 


(43) Ele|><o, Ee’>0. 

(44) As a function of s € R, i E|X-;|"l| Yo {(x+s]|Yo|]) dG(x) is continuous 
at 0, 1<¢j<p. 

(45) Forevery 5>0, ueR?, 

f | f ELI Yolllf(xen/7(w Yost || Yoll)) — (en /7u’ ¥o)]}? dG(x)dt = o(1). 
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(46) For every ue R’, 
_7 n = 
fo? B45 [Xia] Hen a ¥;))? G(x) = O91), 1646. 


An argument similar to the one used in verifying the Claim 5.5.1 shows 

to) Oe Gloss (43) and (44) imply (8,) while (5.5.68b), (45) and (46) imply 
9g) and (10g). 

In particular if G(x) = x, then (5.5.68), (43) and f continuous imply 

all of the above conditions, (5.5.69) and (5.6a.13). This is seen with the help 

of a version of Scheffe’s Theorem. O 


Remark 7.4.3. Asymptotic relative efficiency of px, px. Since their 
asymptotic variances are the same, we shall carry out the discussion in terms 


of px only, as the same applies to py under the additional assumption of the 
symmetry of F and G. 


Consider the case p= 1. Let o? = Var(e) and pis denote the least 
square estimator of p;. Then it is well known that under (7b), ni/ (Ais — p1) 
ar N(0, 1-p?). See, e.g., Anderson (1971). Also note that in this case 


(EYoY0) * = (1-7) /o2. Hence the asymptotic relative efficiency e of fx, 


relative to pis, obtained by taking the ratio of the inverses of their 
asymptotic variances, iS 


(47) e = (fx, pis) = 02/7?. 


Note that e > 1 means px is asymptotically more efficient than js. 


It follows that px is to be prefered to pis for the heavy tailed error d.f.’s 
F. Also note that if G(x) =x then 7? = 1/12 [ff%(x) dx]? and e = 12 0? 
fess dx]?. If G is degenerate at 0, then 7? = 1/0) and e = 
4o%f2(0). These expressions are well known in connection with the Wilcoxon 
and median rank estimators of the slope parameters in linear regression 
models. For example if F is N(0, 1) then the first expression is 3/a while 
the second is 2/7. See Lehmann (1975) for some bounds on these 
expressions. Similar conclusions remain valid for p > 1. Oo 


Remark 7.4.4. Least Absolute Deviation Estimator. As mentioned 


earlier, if we choose g(x) = x and G to be degenerate at 0 then px is the 
l.a.d. estimator, v.i.z., 


: Pon . , y) 
(48) p, joe argmin{ % | 2, Xi4 sign(X;-t Y;-:)]°; teR?} 


254 AUTOREGRESSION 7.4 


See also (6). Because of its importance we will now summarize sufficient 
conditions under which it is asymptotically normally distributed. Of course 
we could use the stronger conditions (43) — (46) but they do not use the 
given information about G. 


Clearly, (7b) implies (7ag) when g(x) = x. Moreover, in this case the 
l.h.s. of (8) is 


EX? |F(n /?(w’ YorallYoll)) - F(n 1/7u’ ¥)| 


which tends to 0 by the D.C.T., (7b) and the continuity of F,1< j< p. 
Now consider (9). Asssume the following: 


(49) F has a density f{ that is continuous at0 and f(0) > 0. 
Recall from (7.3b.6) that under (7.3b.3), (7.3b.4) and (7b), 
(50) n/?max{||¥s-il|; 1 <i < n} = 0p (1). 
The r.v.’s involved in the l.h.s. of (9,) in the present case are 
aS Xi gf Pw Yi +P ayYi all) 
F(a Yi — 076 ¥s all)HP 


which, in view of (49), can be bounded above by 
_1 Nn 
(51) 462 [0° Y XG 5M¥iall (mil, 


where {7ni} are I.v.’S, Mni € nay; — 6 Yi-l], a Yi- + Ol] Ys-s|], 1<i<n. 
Hence, by the stationarity and the ergodicity of the process {X;}, (7b), (49) 
and (50) imply that the r.v.’s in (51) converge to 46? [EX‘s-;|| Yol £(0)]”, 
ans, <j<p. This verifies (9,) in the present case. One similarly verifies 
10g). 

Also note that here (5.5.68) is implied by (49) and (5.5.69) is trivially 


satisfied as {F(1—F) dG < 1/4 in the present case. We summarize the above 
discussion in 


Corollary 7.4.3. Assume that the autoregression model (7.3b.3) and 
(7.3b.4) holds. In addition, assume that the error d.f. F has finite second 
moment, F(0) = 1/2 and satisfies (49). Then, 

a/(p 44-0) — N(0, (EYoY0) "/48(0)), 
where p, aq 28 defined at (48). o 
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7.5. GOODNESS-OF-FIT TESTING. 


Once again consider the AR(p) model given by (7.3b.3), (7.3b.4) and let Fo 
be a known d.f.. Consider the problem of testing Hy: F = Fo. One of the 
common tests of Hp is based on the Kolmogorov—Smirnov statistic 


n= n/sup, | F(x, p) — Fo(x)|. 
From Corollary 7.2.1 one readily has the following: 
If Fo has a uniformly continuous density fo, fo > 0 a.e.; f x" dF o(x) < 
o, p satisfies (7.3c.(iv)) under Fo, then, under Hp, 
Dn = sup |B(Fo(x)) + n)/7(p — p) n “3; Yi-1 fo(x)] + op(1). 


In addition, if EYp9 = 0 = Ee;, then Dy a sup{|B(t)|, 0<t< 1}, thereby 
rendering Dy asymptotically distribution free. 

Next, consider, Ho;; F = N(p, 0”), HER, o”? > 0. In other words, 
Ho, states that the AR(p) process is generated by some normal errors. Let 
lin, On and py beestimators of p, 0, p respectively. Define 


F(x) =n? Dy 1(Xi < xGn + fn + pn Y3), x€R, 
Dn := n’/? sup,|Fa(x) — O(x)|, ®=N(0, 1) dt. 


Corollary 7.2.1. can be readily modified in a routine fashion to yield that if 


1/2); ~ -1 1/2), - 
a? (jin — ) + (3-0) +0!" lln — pl = Op(1) 

then 

- 1/257 ~ -1 

Dn := supx| B(#(x)) + n/"{(fm — uw) + (Sn — o)}o* n(x)| + op(1), 
where n is the density of ®. Thus the asymptotic null distribution of Dn 
is similar to its analogue in the one sample location—scale model: the 
estimation of p has no effect on the large sample null distribution of Dn. 

Clearly, similar conclusions can be applied to other goodness-of-fit 
tests. In particular we leave it as an ezercise for an interested reader to 
investigate the large sample behaviour of the goodness-of-fit tests based on 


Lz—distances, analogous to the results obtained in Section 6.3. Lemma 6.3.1 
and the results of the previous section are found useful here. oo0o000 


APPENDIX 


We include here some results relevant to the weak convergence of processes 
in D[0, 1] and C[0, 1] for the sake of easy reference and without proofs. Our 
source is the book by Billingsley (1968) (B) on Convergence of Probability 
Measures. 

To begin with, let &, ...., &m be r.v.’s, not necessarily independent 
and define 


k 
Sk = gG, 1<k<m; Ma:= max [Sx| . 


The following lemma is obtained by combining (12.5), (12.10) and 
Theorem 12.1 from pp 87-89 of (B). 


Lemma A.1. Suppose there exist nonnegative numbers 4, Uo, ..., Um, @ 
y¥>0 andan a>O such that 
k 
E{|Sx —$;| 7S; —Si] 7} < (3. uy), O0<i<j<kém 
r=1+ 
Then, V A > 0, 


P(Mn > A) <Kya+ A 7 E us)" + P(|Sal > —9-), 
r= 
where Ky,a 18 a constant depending only on y and a. 
The following inequality is given as Corollary 8.3 in (B). 


Lemma A.2. Let {¢(t),0<t <1} be a stochastic process on some 
probability space. Let 6>0, VD=to<ty<... <trp=1 with ty—ti+> 6,2 
<i<r—1 bea partition of [0,1]. Then, Ve >0,V0< 6<1, 


P( sup _| C(t) ¢(s)| 23e)< BPC sup | C(t) — C(ti)] 2 6). 
|t—s| <6 ded ta-1St<t; 


Definition: A sequence of stochastic processes {¢n} in D[0, 1] is said 
to converge weakly to a stochastic process ¢ € €/0, 1] if every finite 
dimensional distribution of {¢,} converges weakly to that of ¢ andif {¢n} 
is tight with respect to the uniform metric. 


The following theorem gives sufficient conditions for the weak 


convergence of a sequence of stochastic processes in D[0, 1] to a limit in 
€[0, 1]. It is essentially Theorem 15.5, p 127 of (B). 
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Theorem A.1l. Let {¢n(t), 0< +t <1} be a sequence of stochastic 
processes in D[0, 1]. Suppose that |¢n(0)| = Op(1) and that V « > 0, 


limlimsupn P( sup |¢n(s) — Gn(t)| > ©) = 0. 
14 0 | -t|<y 


Then the sequence {Cn(t), 0< + <1} is tight, and of ¢ is the weak limit of a 
subsequence {Cn,(t), 0<t <1}, then P(¢ € €[0, 1]) = 1. 


The following theorem gives sufficient conditions for the weak 
convergence of a sequence of stochastic processes in €[0, 1] to a limit in 
C[0, 1]. It is essentially Theorem 12.3, p 95 of (B). 


Theorem A.2. Let {¢,(t), 0< t < R be a sequence of stochastic 
processes in C[0, 1]. Suppose that |¢n(0)| = Op(1) and that there ezist a 
¥>0,a>1 anda nondecreasing continuous function F on (0, 1] such that, 


P(|Ca(t) — Cn(s)| 2A) <A "| F(t) — F(s)|" 


holds for all s,t in [0,1] and for all \ > 0. 
Then the sequence {Cy(t), 0< t <1} its tight, andif ¢ is the weak limit 
of a subsequence Lc. (t), O< t <1}, then P(¢ce €[0, 1]) = 1. 


We also need a central limit theorem for martingale arrays. Let (Q, f, 
P) bea probability space; {Fn,;, 1 <i < n}, be an array of sub o—fields such 


that 7n,i C Fn,in, 1 <i1¢ 0; Xni be Fn,i measurable r.v. with EX?, < o, 


{Sni, Fnyij 1 < i < n, n > 1} is called a zero—mean square—integrable 


martingale array with differences {Xni; 1 <i <n, n > 1}. 


The central limit theorem we find useful is Corollary 3.1 of Hall and 
Heyde (1980) which we state here for an easy reference. 


Lemma A.3. Let {Sni, Fnyi; 1 < i < n, n > 1} de a zero—mean 
square—integrable martingale array with differences {Xni} satisfying the 
following conditions. 


Nn 
(1) Ye>0,  ¥ E[Xni1(|Xnil > €)|Fnsi-d] = op(1). 
(2) yy E[X2;|Fn,i1] a r.v. 1°, in probability. 
l= 
(3) Fnyi C Fnetyi; 1<i¢n, n> l. 


Then Snn converges in distribution to ar.v. Z whose characteristic function 
at t is E exp(—n’t7/2), teR. o 
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